Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

Last Updated on 2024-09-07 by Clay

Recently, I integrated several applications of Outlines into my current workflow. Among them, the one I use most frequently is with vLLM. However, for some reason, its documentation has not been merged into the vLLM GitHub repository, so while designing the process, I had to constantly refer to the source code of a rejected PR for guidance XD

In light of this, I decided to compile a set of notes for my own reference, which I am documenting here.

If readers would like to learn more about Outlines and Finite-State Machines (FSM), they might want to check out my previous notes:

How to Use Outlines in vLLM

Specific JSON Format

We can use the Pydantic format to specify the JSON format that the LLM needs to generate.

import json
import requests

from pydantic import BaseModel, Field

class Answer(BaseModel):
    is_human: bool
    age: int = Field(..., ge=0, le=5)

metric_schema = Answer.model_json_schema()

input_data = {
    "model": model,
    "guided_json": metric_schema,
    "messages": [
        {
            "role": "user",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "assistant",
            "content": "Nice to meet you!",
        },
        {
            "role": "user",
            "content": "How old are you? Are you a human?",
        }
    ]
}

response = requests.post(url=vllm_url, json=input_data)
print(json.loads(response.json()["choices"][0]["message"]["content"]))

Output:

{'is_human': False, 'age': 0}

Regular Expression

import json
import requests

regex_pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

input_data = {
    "model": model,
    "guided_regex": regex_pattern,
    "messages": [
        {
            "role": "user",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "assistant",
            "content": "Nice to meet you!",
        },
        {
            "role": "user",
            "content": "What is the IP address of the Google DNS servers?",
        }
    ]
}

response = requests.post(url=vllm_url, json=input_data)
print(response.json()["choices"][0]["message"]["content"])

Output:

8.8.8.8

Multiple Choice

import requests

choices = ["Positive", "Negative"]

input_data = {
    "model": model,
    "guided_choice": choices,
    "messages": [
        {
            "role": "user",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "assistant",
            "content": "Nice to meet you!",
        },
        {
            "role": "user",
            "content": "How do you feel which emotion that I have: I'm glad to help you!",
        }
    ]
}

response = requests.post(url=vllm_url, json=input_data)
print(response.json()["choices"][0]["message"]["content"])

Output:

Positive

From the above examples, we can see that vLLM fully supports the application of the Outlines tool. I’m documenting this for future reference.

References

Structuring Model Outputs Using the Outlines Tool

Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

How to Use Outlines in vLLM

Specific JSON Format

Regular Expression

Multiple Choice

References

Read More

Leave a ReplyCancel reply

Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

How to Use Outlines in vLLM

Specific JSON Format

Regular Expression

Multiple Choice

References

Read More

Share this:

Leave a ReplyCancel reply