Kangaroo: Inference Acceleration Architecture Implementation
Last Updated on 2024-12-10 by Clay
Introduction
Kangaroo is an implementation of Self-Speculative Decoding that introduces a trainable adapter layer. Over the past few weeks, I have been working on fine-tuning its adapter layer and have achieved some preliminary results, which I am documenting here.
Read More »Kangaroo: Inference Acceleration Architecture Implementation