Machine Learning

A Note Of Large Language Model Decode Sampling

Clay
2024-11-082024-11-08
Machine Learning, PyTorch

When we use large language models for generative tasks, particularly in auto-regressive tasks, the model essentially performs a massive classification task. The classification targets are the tokens in our vocabulary, which are the smallest building blocks that make up words.

If we want to use greedy decoding, we can simply take the maximum value of the logits in the final layer of the model’s decoding layer. However, if we want to introduce diversity and some level of randomness in the model’s output, we have several parameters we can adjust to turn the logits into a probability distribution.

[Paper Reading] Fast Inference from Transformers via Speculative Decoding

Clay
2024-11-062024-11-06
AI, Machine Learning, Papers

Abstract

In auto-regressive model decoding, if we need to decode K tokens, we must go through the process K times, which is the current bottleneck in the inference time of large language models.

KV Cache: A Caching Mechanism To Accelerate Transformer Generation

Clay
2024-11-012024-11-01
AI, Machine Learning

During the decoding process of large language models, especially in Auto-regressive models, decoding must be performed step-by-step until the entire sequence is generated. Within this process, there are caching techniques that can help reduce computation and improve decoding speed; one such technique is known as the KV Cache.

Using Finite State Machine (FSM) and Rollback Mechanism to Restrict LLM from Generating Banned Words

Clay
2024-10-292024-10-29
AI, Machine Learning

When implementing various services through LLMs, do you worry about uncontrolled language generation? Recently, at a critical juncture in wrapping up a project, I used tools like Outlines to constrain LLM decoding, which effectively controlled the model’s output to follow the desired patterns. However, a colleague posed a deep question: What if I want it not to generate specific words?

Note on Calculating VRAM Consumption for Training and Inference of AI Models

Clay
2024-10-242024-10-24
AI, Machine Learning

I’ve always used rough formulas to estimate the relationship between the scale of my models and the GPU VRAM consumption; after all, there are too many variables involved—model architecture, number of layers, attention mechanism implementation, sequence length, batch size, data precision used in training or inference… all of these affect our final calculation results.

Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)

Clay
2024-10-192024-10-19
AI, Machine Learning

I’ve been intermittently reading about a fine-tuning method called Kahneman-Tversky Optimization (KTO) from various sources like HuggingFace’s official documents and other online materials. It’s similar to DPO as a way to align models with human values, but KTO’s data preparation format is much more convenient, so I’m quickly applying it to my current tasks before making time to study the detailed content in the related papers.

[Paper Reading] ENTP: ENCODER-ONLY NEXT TOKEN PREDICTION

Clay
2024-10-162024-10-16
AI, Machine Learning, Papers

The following are some points in this paper:

[Machine Learning] Note Of Kullback-Leibler Divergence

Clay
2024-10-132024-10-13
Machine Learning, Python

What is KL Divergence?

In machine learning, we often encounter the term KL Divergence (also known as Kullback-Leibler Divergence). KL Divergence is a metric used to evaluate the difference between two probability distributions P and Q.

Notes on Fine-Tuning a Multi-Modal Large Language Model Using SFTTrainer (Taking LLaVa-1.5 as an Example)

Clay
2024-10-082024-10-08
AI, Machine Learning, PyTorch

A multi-modal large language model (Multi-Modal Large Language Model) isn’t limited to text only. I know this might sound contradictory, but this is a term that has become widely accepted. What I want to document today is how to fine-tune a multi-modal model using a script.

[Machine Learning] Vector Quantization (VQ) Notes

Clay
2024-10-032024-10-03
Machine Learning, Python, Scikit Learn

The first time I heard about Vector Quantization (VQ) was from a friend who was working on audio processing, which gave me a vague understanding that VQ is a technique used for data feature compression and representation. At that time, I still wasn’t clear on how it differed from dimensionality reduction techniques like PCA.

« Previous
1
2
3
4
…
16
Next »

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30