[論文閱讀] Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Last Updated on 2025-03-25 by Clay 最近依然還是在看加速推理的東西,奈何手邊 … 閱讀全文 [論文閱讀] Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads