AI

Using Finite State Machine (FSM) and Rollback Mechanism to Restrict LLM from Generating Banned Words

Clay
2024-10-292024-10-29
AI, Machine Learning

When implementing various services through LLMs, do you worry about uncontrolled language generation? Recently, at a critical juncture in wrapping up a project, I used tools like Outlines to constrain LLM decoding, which effectively controlled the model’s output to follow the desired patterns. However, a colleague posed a deep question: What if I want it not to generate specific words?

Clay
2024-10-242024-10-24
AI, Machine Learning

I’ve always used rough formulas to estimate the relationship between the scale of my models and the GPU VRAM consumption; after all, there are too many variables involved—model architecture, number of layers, attention mechanism implementation, sequence length, batch size, data precision used in training or inference… all of these affect our final calculation results.

Clay
2024-10-222024-10-22
AI, Essay

Today, while I was eating, I came across a video (the video is attached at the end of this article). Unlike many tech channels that jump straight into discussing AI, economics, and replacing humans, this video took a more careful approach. It explained in detail how hardware specifications have influenced algorithms (or AI model architectures) over time.

Clay
2024-10-192024-10-19
AI, Machine Learning

I’ve been intermittently reading about a fine-tuning method called Kahneman-Tversky Optimization (KTO) from various sources like HuggingFace’s official documents and other online materials. It’s similar to DPO as a way to align models with human values, but KTO’s data preparation format is much more convenient, so I’m quickly applying it to my current tasks before making time to study the detailed content in the related papers.

Clay
2024-10-162024-10-16
AI, Machine Learning, Papers

The following are some points in this paper:

Clay
2024-10-082024-10-08
AI, Machine Learning, PyTorch

A multi-modal large language model (Multi-Modal Large Language Model) isn’t limited to text only. I know this might sound contradictory, but this is a term that has become widely accepted. What I want to document today is how to fine-tune a multi-modal model using a script.

Clay
2024-10-062024-10-06
AI

This year, due to work, I tried annotating the data myself; it was only after diving into it personally that I truly understood just how profoundly training data affects an AI model.

Clay
2024-09-252024-09-25
AI, Machine Learning

In the process of training and fine-tuning deep neural networks, the most important and scarce resource is undoubtedly the GPU’s VRAM. Therefore, making every bit perform at its best is a critical task.

Clay
2024-09-142024-09-14
AI, Machine Learning

Problem Description

Recently, I’ve achieved some good application results by fine-tuning Gemma-2. However, I encountered various errors when deploying it on the client’s equipment, which was quite frustrating. Currently, there isn’t a systematic troubleshooting guide online, so I’m documenting it here.

Clay
2024-09-032024-09-03
AI, Machine Learning

When applying Large Language Models (LLMs) in real-world scenarios, it’s often not just about letting the model generate text freely. We might want the model to return specific structures, such as multiple-choice questions or providing a rating. In such cases, transformers-based models can directly use the outlines tool.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

AI

Using Finite State Machine (FSM) and Rollback Mechanism to Restrict LLM from Generating Banned Words

Note on Calculating VRAM Consumption for Training and Inference of AI Models

Here’s a thought: Will Transformers be replaced in the future?

Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)

[Paper Reading] ENTP: ENCODER-ONLY NEXT TOKEN PREDICTION

Notes on Fine-Tuning a Multi-Modal Large Language Model Using SFTTrainer (Taking LLaVa-1.5 as an Example)

“Common sense, as people call it, is merely the biases learned during youth”—the training data for AI models is no different

Differences in Precision Representations in Deep Learning: Float32, Float16, Float8, and BFloat16

Troubleshooting Accelerated Inference of Gemma-2 on V100 GPUs Using vLLM

Problem Description

Structuring Model Outputs Using the Outlines Tool