11 月 2024

布魯爾定理（Brewer's/CAP Theorem）筆記

Clay
2024-11-262024-11-26
Computer

Last Updated on 2024-11-26 by Clay

最近在看分散式系統的筆記，希望能對當前這一年所建立的系統進行反思，審視有什麼可以改進的點，就在這時被人推薦看看 CAP 定理，一看之下覺得滿直覺的，就順手紀錄於此。

Clay
2024-11-252024-11-25
C++, LeetCode, Python

Last Updated on 2024-11-25 by Clay

題目

Given an integer array nums where the elements are sorted in ascending order, convert it to a height-balanced binary search tree.

Clay
2024-11-212024-11-22
AI, Machine Learning, PyTorch

Last Updated on 2024-11-22 by Clay

目前我看的許多加速推理技巧，如 Speculative Decoding 等等方式，大多數都是採用把 draft model 信心分數設定一個閾值（threshold）來決定現在要解碼多少個 draft tokens、再交由 target model 進行驗證，以此來減少 draft model 在低信心程度的情況下額外多推測的時間開銷。

Clay
2024-11-182024-11-18
AI, Machine Learning

Last Updated on 2024-11-18 by Clay

最近嘗試實作了許多推測性解碼（Speculative Decoding）的加速方法，而 HuggingFace 的 transformers 套件中自然也有對應的加速方法 assistant_model，今天就趁這個機會一起紀錄下來。

Clay
2024-11-172024-11-17
AI, Machine Learning, Python, PyTorch

Last Updated on 2024-11-17 by Clay

在過去的一週裡，我抽空按照論文 Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding 的思路嘗試復現了一遍自推測性解碼（Self-Speculative Decoding），包含以下模組：

跳層解碼的 Decoder-only Transformer 模型（主要以 Llama 和 Gemma-2 兩種架構為主）
自適應草稿離開機制
貝氏優化探索最佳跳層策略（尋找怎樣的搭配才會是最好的草稿模型）
Self-Speculative Decoding —— 完成只靠模型自身的加速

Clay
2024-11-142024-11-14
AI, Machine Learning, Papers

Last Updated on 2024-11-14 by Clay

本篇論文重點

量化、剪枝、蒸餾同樣可以加速，但得面對與原始模型不同的輸出分佈、重新訓練的開銷等等問題
原先的 Speculative Decoding 面對的問題則為我們需要使用額外的記憶體空間去驅動 draft model（草稿模型），而 Self-Speculative Decoding 僅使用了自身部份神經網路作為 draft model
自適應草稿脫離機制（Adaptive Draft-Exiting Mechanism）可以基於自動調整信心分數閾值來自動調整草稿模型的推測 tokens 數量

Clay
2024-11-132024-11-13
AI, Machine Learning

Last Updated on 2024-11-13 by Clay

在自推測性解碼（Self-Speculative Decoding）中，由於我們的 draft model 是由 target model 的部份網路擔任，所以找到一個好的『跳層策略』（Layer Skip Strategy）是非常重要的事情 —— 我們不僅要跳得夠多層讓加速真正意義上實現、也需要讓 draft model 的推測解碼程度足夠好且不容易被 target model 驗證時拒絕。

所以今天的實作，就是靠貝氏優化框架 Optuna 來優化我之前的實現的 LayerSkip 模型，決定到底要跳哪幾層。

Clay
2024-11-102024-11-10
AI, Machine Learning, PyTorch

Last Updated on 2024-11-10 by Clay

介紹

自推測性解碼（Self-Speculative Decoding）是一個推測性解碼（Speculative Decoding）的變體。原本的 Speculative Decoding 是採用一個草稿模型（draft model）來優化我們真正想要推理的目標模型（target），並且 draft model 擁有與 target model 相似的輸出以及快上幾倍的推理時間，通常是由 target model 蒸餾而來。

Clay
2024-11-072024-11-07
Math

Last Updated on 2024-11-07 by Clay

前言

最近在嘗試整理這一年來所閱讀的加速推理技巧論文成筆記，過程中看到了用到了貝氏的貝氏優化技巧，遂決定也寫一篇筆記記錄貝氏定理的精神。

簡單來說，貝氏定理（Bayes' Theorem）是機率論中的經常會看到的定理，描述在特定條件下一隨機事件的發生機率。

Clay
2024-11-052024-11-06
Machine Learning, PyTorch

Last Updated on 2024-11-06 by Clay

介紹

推測性解碼（Speculative Decoding）是一種實用性極強的加速推理技巧，通過讓小模型（draft model）快速、連續地解碼多個 Tokens 並保留過程中的採樣機率分佈，並讓我們真正希望加速的大模型（target model）在此之上預測下一個 Token —— 同時把過往的每個 Token 位置的採樣機率分佈一次性地計算得出，再透過 target model probs 去驗證 draft model probs 的有效性，並接受足夠可靠的 draft model 的推測解碼 Tokens。

11 月 2024

布魯爾定理（Brewer's/CAP Theorem）筆記

LeetCode: 108. Convert Sorted Array to Binary Search Tree 解題紀錄

題目

Speculative Decoding 時採用目標模型（Target Model）的信心閾值來決定是否啟用草稿推測

使用 HuggingFace `transformers` 套件中模型的 `assistant_model` 方法來進行 Speculative Decoding 的加速

Self-Speculative Decoding 完整實作: LayerSkip Model, Bayesian Optimization, and Adaptive Draft-Exiting Mechanism（附 gemma-2-9b-it 實驗結果）

[論文閱讀] Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

本篇論文重點

透過貝氏優化去搜索 LayerSkip 模型的最佳跳層策略

Self-Speculative Decoding 實現: 跳層 Transformer 模型實作筆記

介紹

貝氏定理（Bayes' Theorem）筆記

前言

推測性解碼（Speculative Decoding）實作筆記（附簡易實驗結果）

介紹

2024 年 11 月
一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30