Skip to content

June 3, 2024

[Paper Reading] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Last Updated on 2024-07-25 by Clay

Introduction

The accelerated framework is proposed by Huawei Noah's Ark Lab, it replaces the small model used in the original speculative decoding with the shallow sub-network of the large model. Additionally, it employs an extra-trained adapter and the model’s own decoding head to generate speculative tokens, which are then verified by the large model. The subsequent operations are quite similar to the original speculative decoding process.

Read More »[Paper Reading] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting