Skip to content

September 2024

Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

Last Updated on 2024-09-05 by Clay

This is a simple Python implementation, used to test Finite-State Machine (FSM) constraints for a Large Language Model (LLM) to decode responses in a specific format. It also serves as an introduction to the concept behind the Outlines tool. Of course, my implementation is far simpler compared to the actual Outlines tool.

Read More »Implementation of Using Finite-State Machine to Constrain Large Language Model Decoding

Structuring Model Outputs Using the Outlines Tool

Last Updated on 2024-09-03 by Clay

When applying Large Language Models (LLMs) in real-world scenarios, it's often not just about letting the model generate text freely. We might want the model to return specific structures, such as multiple-choice questions or providing a rating. In such cases, transformers-based models can directly use the outlines tool.

Read More »Structuring Model Outputs Using the Outlines Tool

Implementing Streamed Output Token Generation Using TextStreamer and TextIteratorStreamer in HuggingFace Transformers

Last Updated on 2024-09-01 by Clay

Introduction

Generative models are becoming increasingly powerful, and independent researchers are deploying one open-source large language model (LLMs) after another. However, when using LLMs for inference or generating responses, waiting for a longer output can be quite time-consuming.

Read More »Implementing Streamed Output Token Generation Using TextStreamer and TextIteratorStreamer in HuggingFace Transformers
Exit mobile version