Self-Speculative Decoding Implementation: LayerSkip Transformer

Last Updated on 2024-11-12 by Clay Introduction Self-Speculative Decoding is a variant of Speculative Decoding. The original Speculative Decoding method uses a draft model to optimize the inference of the target model. The draft model, which is typically distilled from the target model, offers similar output quality but with several times faster inference speed. Once … Continue reading Self-Speculative Decoding Implementation: LayerSkip Transformer