Optimizing LayerSkip Models with Bayesian Search for an Effective Layer Skipping Strategy

Last Updated on 2024-11-15 by Clay In self-speculative decoding, since our draft model is derived from part of the target model’s network, finding an optimal ‘Layer Skip Strategy’ is crucial. We need to skip enough layers to achieve meaningful speedup while ensuring the draft model’s speculative decoding is good enough to avoid frequent rejection by … Continue reading Optimizing LayerSkip Models with Bayesian Search for an Effective Layer Skipping Strategy