Skip to content

November 12, 2024

Self-Speculative Decoding Implementation: LayerSkip Transformer

Last Updated on 2024-11-12 by Clay

Introduction

Self-Speculative Decoding is a variant of Speculative Decoding. The original Speculative Decoding method uses a draft model to optimize the inference of the target model. The draft model, which is typically distilled from the target model, offers similar output quality but with several times faster inference speed.

Read More »Self-Speculative Decoding Implementation: LayerSkip Transformer