Skip to content

vLLM 加速推理框架中使用整合之 Outlines 工具進行解碼約束

Last Updated on 2024-09-06 by Clay

最近把不少 Outlines 的應用整合進了我當前工作流程中,其中我最常與 vLLM 一起使用的,不過其說明文件不知為何在 vLLM GitHub 上一直沒被 merge,所以我在設計流程時不得不一直拿一個被 rejected 的 PR 原始碼在看說明 XD

有鑑於此,我決定整理一個給自己看的筆記,並紀錄於此。

如果讀者對於 Outlines 和有限狀態機(Finite-State Machine, FSM)有些想要事先了解的部份,或許可考慮瀏覽一下我之前的筆記:


vLLM 中使用 Outlines 的方法

特定 JSON 格式

我們可以使用 pydantic 格式指定 LLM 需要生成的 JSON 格式。

import json
import requests


from pydantic import BaseModel, Field


class Answer(BaseModel):
    is_human: bool
    age: int = Field(..., ge=0, le=5)

metric_schema = Answer.model_json_schema()

input_data = {
    "model": model,
    "guided_json": metric_schema,
    "messages": [
        {
            "role": "user",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "assistant",
            "content": "Nice to meet you!",
        },
        {
            "role": "user",
            "content": "How old are you? Are you a human?",
        }
    ]
}

response = requests.post(url=vllm_url, json=input_data)
print(json.loads(response.json()["choices"][0]["message"]["content"]))


Output:

{'is_human': False, 'age': 0}



正規表示法

import json
import requests


regex_pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

input_data = {
    "model": model,
    "guided_regex": regex_pattern,
    "messages": [
        {
            "role": "user",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "assistant",
            "content": "Nice to meet you!",
        },
        {
            "role": "user",
            "content": "What is the IP address of the Google DNS servers?",
        }
    ]
}

response = requests.post(url=vllm_url, json=input_data)
print(response.json()["choices"][0]["message"]["content"])


Output:

8.8.8.8



選擇題

import requests


choices = ["Positive", "Negative"]

input_data = {
    "model": model,
    "guided_choice": choices,
    "messages": [
        {
            "role": "user",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "assistant",
            "content": "Nice to meet you!",
        },
        {
            "role": "user",
            "content": "How do you feel which emotion that I have: I'm glad to help you!",
        }
    ]
}

response = requests.post(url=vllm_url, json=input_data)
print(response.json()["choices"][0]["message"]["content"])


Output:

Positive



從以上案例中可以發現,其實 vLLM 基本上完全支援 Outline 工具的應用。純紀錄於此,好讓我之後反覆來查閱。


References


Read More

Leave a Reply