[Paper Reading] Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Last Updated on 2024-11-16 by Clay Highlights of This Paper Abstract Researchers have proposed a variation called Self-Speculative Decoding based on the original Speculative Decoding. The original Speculative Decoding can be divided into two stages: drafting and verification. During the drafting stage, a smaller draft model, which shares the same vocabulary as the target model … Continue reading [Paper Reading] Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding