30 6 月, 2025

在 TensorRT-LLM Python Session 上支援 Hydra Speculative Decoding

Last Updated on 2025-07-01 by Clay

之前我閱讀過許多不同的 Speculative Decoding 加速推理技巧，也嘗試使用 PyTorch 實現了幾種不同的架構，包括模型架構、訓練與推理等腳本（fast-llm-inference），這一次當然又是新的目標。