Last Updated on 2024-11-02 by Clay
I have recently set up numerous backend API servers for Chatbots. Initially, I received user messages and returned the entire LLM-generated reply in one go to the frontend interface. However, this approach did not provide a good user experience. I then switched to HTTP streaming, sending each generated token to the frontend as it was produced. Later, I found that some users' devices experienced packet sticking, so I finally switched to using WebSocket.
Recently, my frontend colleagues and I discussed that using Server-Sent Events (SSE) might be a better option. As a result, I started exploring how to build an SSE API using FastAPI.
A quick overview shows that SSE is also based on HTTP and can be considered a lightweight alternative to WebSocket.
First, install the following package:
pip install sse-starlette
Next, you can perform a simple test:
import asyncio
from fastapi import FastAPI
from sse_starlette.sse import EventSourceResponse
app = FastAPI()
async def event_generator(sent: str):
for char in sent:
yield {"event": "message", "data": char}
await asyncio.sleep(0.2)
@app.get("/api/chatbot/stream")
async def sse_endpoint(sent: str):
return EventSourceResponse(event_generator(sent))
This is a classic repetitive sentence response bot, used here to test the replacement effect of LLM. You can start it using the following command:
uvicorn app:app --port 8080 --reload
Once we make the following request:
import httpx
url = "http://127.0.0.1:8080/api/chatbot/stream"
params = {"sent": "你好,今天天氣不錯~"}
with httpx.stream("GET", url, params=params) as response:
for line in response.iter_text():
if line:
print(line)
Output:
event: message
data: 你
event: message
data: 好
event: message
data: ,
event: message
data: 今
event: message
data: 天
event: message
data: 天
event: message
data: 氣
event: message
data: 不
event: message
data: 錯
event: message
data: ~
(Notes are not yet fully organized)