Machine Learning

[Paper Reading] Mistral 7B

Clay
2024-07-212024-07-25
Machine Learning

Introduction

Mistral 7B is a large language model (LLM) proposed on September 27, 2023, trained by the Mistral AI team, which also released its weights as open source. Interestingly, it uses the highly permissive Apache 2.0 license, unlike Llama 2, which has its own Llama license terms. Therefore, Mistral 7B is truly “open source” (Llama’s license requires discussion with Meta AI when the service volume reaches 700 million).

PaddleOCR: A Framework and Model Specialized in Chinese Optical Character Recognition (OCR)

Clay
2024-07-202024-07-20
Machine Learning, Python

Introduction

Recently, I have been exploring models used for Optical Character Recognition (OCR). In the past, OCR was a very popular research field as it was one of the earliest practical applications of computer vision. Today, OCR has become a very mature task, and you can easily find high-performance open-source models online.

NuExtract: A Large Language Model For Information Extraction

Clay
2024-07-102024-07-20
Machine Learning

Introduction

In today’s era of flourishing large language models, researchers and companies are racking their brains to apply these models to their work. However, speaking personally, the performance of current language models is still not strong enough, and their application scenarios are limited, often far less than that of humans.

But there is one type of task for which large language models are naturally quite suitable: information extraction in any scenario, which is what I want to introduce today, the NuExtract model.

[Machine Learning] Note Of SiLU Activation Function

Clay
2024-06-062024-06-06
Machine Learning, PyTorch

Introduction

SiLU (Sigmoid Linear Unit) activation function is similar to Swish function, Swish just have additional trainable beta parameter. Many large language model (LLM) also adopt this approach, primarily in some exploratory models that use activation functions other than ReLU, such as the classic Llama architecture.

Note Of Unsloth Accelerate Fine-tuning Open Source Project

Clay
2024-06-042024-06-05
Machine Learning, Python

Introduction

For several months, I have benefited greatly from the Unsloth project, primarily because a significant part of my job involves fine-tuning large language models (LLMs). Fine-tuning LLMs is extremely time-consuming; aside from data collection, the biggest time sink is the endless GPU-powered fine-tuning process.

[Paper Reading] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Clay
2024-06-032024-07-25
Machine Learning, Python

Introduction

The accelerated framework is proposed by Huawei Noah’s Ark Lab, it replaces the small model used in the original speculative decoding with the shallow sub-network of the large model. Additionally, it employs an extra-trained adapter and the model’s own decoding head to generate speculative tokens, which are then verified by the large model. The subsequent operations are quite similar to the original speculative decoding process.

Defense Note Against Prompt Injection Attack

Clay
2024-02-262024-02-26
Machine Learning

What is Prompt Injection Attack?

Prompt injection attacks are a burgeoning security concern, primarily targeting large language models (LLMs) or other AI-related domains.

[Solved] Where Does Loss Calculation Begin When Multiple `response_template` Exist in Training Data Using SFTTrainer?

Clay
2024-02-252024-02-25
Machine Learning

Problem

SFTTrainer is a LLM fine-tuning tool provided by HuggingFace team, that can easily adjust many hyper-parameters and config at the fine-tuning task. In the process, response_template is the special string template we need to pass into the tool, any response right by it will be computed the loss.

[Paper Reading] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Clay
2024-01-222024-07-25
Machine Learning

Introduction

RAG-based LLM is a well-known architecture in current usage of Large Language Models (LLM). It involves “retrieval” to provide the model with prior knowledge that it lacks during training, enabling the model to answer questions in the context of specific information.

[Paper Reading] QLoRA: Efficient Finetuning of Quantized LLMs

Clay
2024-01-212024-07-25
Machine Learning

Introduction

The wave of large models has been unstoppable since the release of ChatGPT in November 2022. Up to now, the scale of open-source Large Language Models (LLMs) continues to increase, such as LLaMA-2-70B and Falcon-180B, to name a few.

« Previous
1
…
4
5
6
7
8
…
16
Next »