Skip to content

Python

Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

Recently, I integrated several applications of Outlines into my current workflow. Among them, the one I use most frequently is with vLLM. However, for some reason, its documentation has not been merged into the vLLM GitHub repository, so while designing the process, I had to constantly refer to the source code of a rejected PR for guidance XD

Read More »Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

This is a simple Python implementation, used to test Finite-State Machine (FSM) constraints for a Large Language Model (LLM) to decode responses in a specific format. It also serves as an introduction to the concept behind the Outlines tool. Of course, my implementation is far simpler compared to the actual Outlines tool.

Read More »Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

Using CuPy to Accelerate Matrix Operations with GPU

Introduction

CuPy is an open-source GPU-accelerated numerical computation library designed for deep learning and scientific computing. It shares many of the same methods and functions as the popular NumPy package in Python but extends its capabilities to perform computations on the GPU. In short, tasks that can benefit from parallel computation on the GPU, such as matrix operations, can achieve significant acceleration with CuPy.

Read More »Using CuPy to Accelerate Matrix Operations with GPU

[Python] Use `httpx` To Replace `requests` For Asynchronous Requests

In Python programming, we often use the requests module for HTTP requests. However, requests can become a bottleneck when connecting frontend and backend services due to its synchronous request handling. Recently, I experienced Kubernetes probe blockages caused by using requests, which led to the unintended deletion of my service container. In such scenarios, httpx might be a more suitable module for asynchronous request handling.

Read More »[Python] Use `httpx` To Replace `requests` For Asynchronous Requests