Kangaroo: Inference Acceleration Architecture Implementation
Introduction
Kangaroo is an implementation of Self-Speculative Decoding that introduces a trainable adapter layer. Over the past few weeks, I have been working on fine-tuning its adapter layer and have achieved some preliminary results, which I am documenting here.
Read More »Kangaroo: Inference Acceleration Architecture Implementation