Using the `assistant_model` method in HuggingFace's `transformers` library to accelerate Speculative Decoding
Last Updated on 2024-11-20 by Clay
Recently, I attempted to implement various speculative decoding acceleration methods. HuggingFace's transformers
library also provides a corresponding acceleration feature called assistant_model
. Today, let me take this opportunity to document it.