Skip to content

AI

Using Finite State Machine (FSM) and Rollback Mechanism to Restrict LLM from Generating Banned Words

When implementing various services through LLMs, do you worry about uncontrolled language generation? Recently, at a critical juncture in wrapping up a project, I used tools like Outlines to constrain LLM decoding, which effectively controlled the model's output to follow the desired patterns. However, a colleague posed a deep question: What if I want it not to generate specific words?

Read More »Using Finite State Machine (FSM) and Rollback Mechanism to Restrict LLM from Generating Banned Words

Note on Calculating VRAM Consumption for Training and Inference of AI Models

I've always used rough formulas to estimate the relationship between the scale of my models and the GPU VRAM consumption; after all, there are too many variables involved—model architecture, number of layers, attention mechanism implementation, sequence length, batch size, data precision used in training or inference... all of these affect our final calculation results.

Read More »Note on Calculating VRAM Consumption for Training and Inference of AI Models

Here’s a thought: Will Transformers be replaced in the future?

Today, while I was eating, I came across a video (the video is attached at the end of this article). Unlike many tech channels that jump straight into discussing AI, economics, and replacing humans, this video took a more careful approach. It explained in detail how hardware specifications have influenced algorithms (or AI model architectures) over time.

Read More »Here’s a thought: Will Transformers be replaced in the future?

Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)

I've been intermittently reading about a fine-tuning method called Kahneman-Tversky Optimization (KTO) from various sources like HuggingFace's official documents and other online materials. It's similar to DPO as a way to align models with human values, but KTO's data preparation format is much more convenient, so I'm quickly applying it to my current tasks before making time to study the detailed content in the related papers.

Read More »Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)

Notes on Fine-Tuning a Multi-Modal Large Language Model Using SFTTrainer (Taking LLaVa-1.5 as an Example)

A multi-modal large language model (Multi-Modal Large Language Model) isn’t limited to text only. I know this might sound contradictory, but this is a term that has become widely accepted. What I want to document today is how to fine-tune a multi-modal model using a script.

Read More »Notes on Fine-Tuning a Multi-Modal Large Language Model Using SFTTrainer (Taking LLaVa-1.5 as an Example)

Troubleshooting Accelerated Inference of Gemma-2 on V100 GPUs Using vLLM

Problem Description

Recently, I've achieved some good application results by fine-tuning Gemma-2. However, I encountered various errors when deploying it on the client's equipment, which was quite frustrating. Currently, there isn't a systematic troubleshooting guide online, so I'm documenting it here.

Read More »Troubleshooting Accelerated Inference of Gemma-2 on V100 GPUs Using vLLM