Machine Learning

[Machine Learning] Note Of SiLU Activation Function

Clay
2024-06-062024-06-06
Machine Learning, PyTorch

Introduction

SiLU (Sigmoid Linear Unit) activation function is similar to Swish function, Swish just have additional trainable beta parameter. Many large language model (LLM) also adopt this approach, primarily in some exploratory models that use activation functions other than ReLU, such as the classic Llama architecture.

Clay
2024-06-042024-06-05
Machine Learning, Python

Introduction

For several months, I have benefited greatly from the Unsloth project, primarily because a significant part of my job involves fine-tuning large language models (LLMs). Fine-tuning LLMs is extremely time-consuming; aside from data collection, the biggest time sink is the endless GPU-powered fine-tuning process.

Clay
2024-06-032024-07-25
Machine Learning, Python

Introduction

The accelerated framework is proposed by Huawei Noah’s Ark Lab, it replaces the small model used in the original speculative decoding with the shallow sub-network of the large model. Additionally, it employs an extra-trained adapter and the model’s own decoding head to generate speculative tokens, which are then verified by the large model. The subsequent operations are quite similar to the original speculative decoding process.

Clay
2024-02-262024-02-26
Machine Learning

What is Prompt Injection Attack?

Prompt injection attacks are a burgeoning security concern, primarily targeting large language models (LLMs) or other AI-related domains.

Clay
2024-02-252024-02-25
Machine Learning

Problem

SFTTrainer is a LLM fine-tuning tool provided by HuggingFace team, that can easily adjust many hyper-parameters and config at the fine-tuning task. In the process, response_template is the special string template we need to pass into the tool, any response right by it will be computed the loss.

Clay
2024-01-222024-07-25
Machine Learning

Introduction

RAG-based LLM is a well-known architecture in current usage of Large Language Models (LLM). It involves “retrieval” to provide the model with prior knowledge that it lacks during training, enabling the model to answer questions in the context of specific information.

Clay
2024-01-212024-07-25
Machine Learning

Introduction

The wave of large models has been unstoppable since the release of ChatGPT in November 2022. Up to now, the scale of open-source Large Language Models (LLMs) continues to increase, such as LLaMA-2-70B and Falcon-180B, to name a few.

Clay
2024-01-182024-07-25
Machine Learning

Introduction

Paper link: https://arxiv.org/abs/2212.13345

The author of this research work is the renowned figure in the field of deep learning, Geoffrey Hinton, who was originally a researcher at Google Brain when he initially wrote this paper (he left in 2023).

Clay
2024-01-092024-01-15
Python, PyTorch

Problem

Yesterday, I developed a model merging program. This time I have no enough gpu memory to merge the models in only one time, so I tried to merge layer by layer. I found the memory of GPU is easily to release but CPU didn’t.

Clay
2024-01-022024-01-02
Machine Learning, Python, PyTorch

Introduction

DPO (Direct Preference Optimization) is a fine-tuning method that want to replace RLHF (Reinforcement Learning from Human Feedback).

« Previous
1
…
5
6
7
8
9
…
17
Next »

Machine Learning