Skip to content

November 22, 2024

Using The Target Model's Confidence Threshold To Decide Whether To Enable Speculative Decoding

Last Updated on 2024-11-22 by Clay

Many of the inference acceleration techniques I have studied, such as Speculative Decoding, predominantly use a threshold for the confidence scores of the draft model. This threshold determines how many draft tokens should be decoded before passing them to the target model for verification, thereby reducing the extra computational cost when the draft model operates with low confidence.

Read More »Using The Target Model's Confidence Threshold To Decide Whether To Enable Speculative Decoding