Using The Target Model's Confidence Threshold To Decide Whether To Enable Speculative Decoding

Using The Target Model’s Confidence Threshold To Decide Whether To Enable Speculative Decoding

Last Updated on 2024-11-22 by Clay Many of the inference acceleration techniques I have studied, such as Speculative Decoding, predominantly use a threshold for the confidence scores of the draft model. This threshold determines how many draft tokens should be decoded before passing them to the target model for verification, thereby reducing the extra computational … Continue reading Using The Target Model’s Confidence Threshold To Decide Whether To Enable Speculative Decoding