Using the `assistant_model` method in HuggingFace's `transformers` library to accelerate Speculative Decoding
Recently, I attempted to implement various speculative decoding acceleration methods. HuggingFace's transformers
library also provides a corresponding acceleration feature called assistant_model
. Today, let me take this opportunity to document it.