Self-Speculative Decoding Implementation: LayerSkip Transformer
Last Updated on 2024-11-12 by Clay
Introduction
Self-Speculative Decoding is a variant of Speculative Decoding. The original Speculative Decoding method uses a draft model to optimize the inference of the target model. The draft model, which is typically distilled from the target model, offers similar output quality but with several times faster inference speed.
Read More »Self-Speculative Decoding Implementation: LayerSkip Transformer