Last Updated on 2024-09-03 by Clay
When applying Large Language Models (LLMs) in real-world scenarios, it’s often not just about letting the model generate text freely. We might want the model to return specific structures, such as multiple-choice questions or providing a rating. In such cases, transformers-based models can directly use the outlines tool.
I first learned about the Outlines tool through a colleague while discussing the internal mechanisms of vLLM. Outlines allows us to set multiple candidate options and select the model’s output from them. Additionally, we can specify the generation of integers, floating-point numbers, JSON formats… and even require the output to conform to a certain regular expression (RE)!
Being able to precisely control the format generated by an LLM is crucial for further deploying the LLM in real-world applications.
Principles of Outlines
Outlines defines a finite state machine using regular expressions to constrain each token generated by the model. In other words, tokens that don’t match the predefined rules won’t be decoded. Imagine a simple scenario:
Now we ask the model: How does this ice cream taste? Then we set only two candidate tokens for the model to choose from: Hot / Cold.
Even though the model might normally decode and give us: I'm sorry, I am a robot, I can't eat any ice cream!
, during the actual decoding process, even if the probability score for I'm
is 0.9 and the probabilities for Hot and Cold are 0.01 and 0.02 respectively, the model will only produce the token Cold because only Hot and Cold are allowed to be decoded.
Practical Application
First, we can install outlines directly via Python.
pip3 install outlines
Next, let’s see how the model handles a multiple-choice question:
import outlines
model = outlines.models.transformers("./models/google--gemma-2-2b-it")
prompt = """How does this ice cream taste?"""
generator = outlines.generate.choice(model, ["Hot", "Cold"])
answer = generator(prompt)
print(answer)
Output:
Cold
Then, as mentioned earlier, we can control the type of numeric output the model generates:
prompt = "<s>result of 9 + 9 = 18</s><s>result of 1 + 2 = "
answer = outlines.generate.format(model, int)(prompt)
print(answer)
Output:
3
But if we replace int
with float
:
prompt = "<s>result of 9 + 9 = 18</s><s>result of 1 + 2 = "
answer = outlines.generate.format(model, float)(prompt)
print(answer)
Output:
3.0
It immediately changes from 3 to 3.0!
Finally, let’s try something fun: character creation.
We can use pydantic to add some constraints, such as the length of the name, hair color options, and so on.
from enum import Enum
from pydantic import BaseModel, constr
import outlines
class Colors(str, Enum):
red = "red"
brown = "brown"
white = "white"
black = "black"
class Character(BaseModel):
name: constr(max_length=10)
age: int
hair_color: Colors
generator = outlines.generate.json(model, Character)
character = generator("Give me a character description")
print(character)
Output:
name='The Oracle' age=32 hair_color=<Colors.black: 'black'>
Is it really generating The Oracle? Maybe I’ve watched The Matrix too many times… But 32 years old is way too young.
Of course, we can modify the prompt to generate the character we want:
generator = outlines.generate.json(model, Character)
character = generator("Give me a superman character description")
print(character)
Output:
In short, this tool can be applied in various aspects, and it supports well-known frameworks like OLLAMA, vLLM, and llama-cpp, making it a highly integrated tool. I feel it has great potential for application in various services!
References
- outlines-dev/outlines: Structured Text Generation
- Fast, High-Fidelity LLM Decoding with Regex Constraints