Skip to content

Python

Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

Recently, I integrated several applications of Outlines into my current workflow. Among them, the one I use most frequently is with vLLM. However, for some reason, its documentation has not been merged into the vLLM GitHub repository, so while designing the process, I had to constantly refer to the source code of a rejected PR for guidance XD

Read More »Using the Integrated Outlines Tool for Decoding Constraints in the vLLM Inference Acceleration Framework

Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

This is a simple Python implementation, used to test Finite-State Machine (FSM) constraints for a Large Language Model (LLM) to decode responses in a specific format. It also serves as an introduction to the concept behind the Outlines tool. Of course, my implementation is far simpler compared to the actual Outlines tool.

Read More »Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

Using CuPy to Accelerate Matrix Operations with GPU

Introduction

CuPy is an open-source GPU-accelerated numerical computation library designed for deep learning and scientific computing. It shares many of the same methods and functions as the popular NumPy package in Python but extends its capabilities to perform computations on the GPU. In short, tasks that can benefit from parallel computation on the GPU, such as matrix operations, can achieve significant acceleration with CuPy.

Read More »Using CuPy to Accelerate Matrix Operations with GPU

[Python] Use `httpx` To Replace `requests` For Asynchronous Requests

In Python programming, we often use the requests module for HTTP requests. However, requests can become a bottleneck when connecting frontend and backend services due to its synchronous request handling. Recently, I experienced Kubernetes probe blockages caused by using requests, which led to the unintended deletion of my service container. In such scenarios, httpx might be a more suitable module for asynchronous request handling.

Read More »[Python] Use `httpx` To Replace `requests` For Asynchronous Requests

Use `snapshot_download` To Download The Models Of HuggingFace Hub

Introduction

HuggingFace Model Hub is now a widely recognized and essential open-source platform for every one. Every day, countless individuals and organizations upload their latest trained models (including those for text, images, speech, and other domains) to this platform. It can be said that anyone working in AI-related fields frequently browses the HuggingFace platform website.

Read More »Use `snapshot_download` To Download The Models Of HuggingFace Hub

PaddleOCR: A Framework and Model Specialized in Chinese Optical Character Recognition (OCR)

Introduction

Recently, I have been exploring models used for Optical Character Recognition (OCR). In the past, OCR was a very popular research field as it was one of the earliest practical applications of computer vision. Today, OCR has become a very mature task, and you can easily find high-performance open-source models online.

Read More »PaddleOCR: A Framework and Model Specialized in Chinese Optical Character Recognition (OCR)