AI

Using Finite State Machine (FSM) and Rollback Mechanism to Restrict LLM from Generating Banned Words

Clay
2024-10-292024-10-29
AI, Machine Learning

When implementing various services through LLMs, do you worry about uncontrolled language generation? Recently, at a critical juncture in wrapping up a project, I used tools like Outlines to constrain LLM decoding, which effectively controlled the model’s output to follow the desired patterns. However, a colleague posed a deep question: What if I want it not to generate specific words?

Note on Calculating VRAM Consumption for Training and Inference of AI Models

Clay
2024-10-242024-10-24
AI, Machine Learning

I’ve always used rough formulas to estimate the relationship between the scale of my models and the GPU VRAM consumption; after all, there are too many variables involved—model architecture, number of layers, attention mechanism implementation, sequence length, batch size, data precision used in training or inference… all of these affect our final calculation results.

Here’s a thought: Will Transformers be replaced in the future?

Clay
2024-10-222024-10-22
AI, Essay

Today, while I was eating, I came across a video (the video is attached at the end of this article). Unlike many tech channels that jump straight into discussing AI, economics, and replacing humans, this video took a more careful approach. It explained in detail how hardware specifications have influenced algorithms (or AI model architectures) over time.

Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)

Clay
2024-10-192024-10-19
AI, Machine Learning

I’ve been intermittently reading about a fine-tuning method called Kahneman-Tversky Optimization (KTO) from various sources like HuggingFace’s official documents and other online materials. It’s similar to DPO as a way to align models with human values, but KTO’s data preparation format is much more convenient, so I’m quickly applying it to my current tasks before making time to study the detailed content in the related papers.

[Paper Reading] ENTP: ENCODER-ONLY NEXT TOKEN PREDICTION

Clay
2024-10-162024-10-16
AI, Machine Learning, Papers

The following are some points in this paper:

Notes on Fine-Tuning a Multi-Modal Large Language Model Using SFTTrainer (Taking LLaVa-1.5 as an Example)

Clay
2024-10-082024-10-08
AI, Machine Learning, PyTorch

A multi-modal large language model (Multi-Modal Large Language Model) isn’t limited to text only. I know this might sound contradictory, but this is a term that has become widely accepted. What I want to document today is how to fine-tune a multi-modal model using a script.

an artist s illustration of artificial intelligence ai this image represents how machine learning is inspired by neuroscience and the human brain it was created by novoto studio as par

“Common sense, as people call it, is merely the biases learned during youth”—the training data for AI models is no different

Clay
2024-10-062024-10-06
AI

This year, due to work, I tried annotating the data myself; it was only after diving into it personally that I truly understood just how profoundly training data affects an AI model.

Differences in Precision Representations in Deep Learning: Float32, Float16, Float8, and BFloat16

Clay
2024-09-252024-09-25
AI, Machine Learning

In the process of training and fine-tuning deep neural networks, the most important and scarce resource is undoubtedly the GPU’s VRAM. Therefore, making every bit perform at its best is a critical task.

Troubleshooting Accelerated Inference of Gemma-2 on V100 GPUs Using vLLM

Clay
2024-09-142024-09-14
AI, Machine Learning

Problem Description

Recently, I’ve achieved some good application results by fine-tuning Gemma-2. However, I encountered various errors when deploying it on the client’s equipment, which was quite frustrating. Currently, there isn’t a systematic troubleshooting guide online, so I’m documenting it here.

Structuring Model Outputs Using the Outlines Tool

Clay
2024-09-032024-09-03
AI, Machine Learning

When applying Large Language Models (LLMs) in real-world scenarios, it’s often not just about letting the model generate text freely. We might want the model to return specific structures, such as multiple-choice questions or providing a rating. In such cases, transformers-based models can directly use the outlines tool.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31