January 2024

[Paper Reading] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Clay
2024-01-222024-07-25
Machine Learning

Last Updated on 2024-07-25 by Clay

Introduction

RAG-based LLM is a well-known architecture in current usage of Large Language Models (LLM). It involves “retrieval” to provide the model with prior knowledge that it lacks during training, enabling the model to answer questions in the context of specific information.

Clay
2024-01-212024-07-25
Machine Learning

Last Updated on 2024-07-25 by Clay

Introduction

The wave of large models has been unstoppable since the release of ChatGPT in November 2022. Up to now, the scale of open-source Large Language Models (LLMs) continues to increase, such as LLaMA-2-70B and Falcon-180B, to name a few.

Clay
2024-01-182024-07-25
Machine Learning

Last Updated on 2024-07-25 by Clay

Introduction

Paper link: https://arxiv.org/abs/2212.13345

The author of this research work is the renowned figure in the field of deep learning, Geoffrey Hinton, who was originally a researcher at Google Brain when he initially wrote this paper (he left in 2023).

Clay
2024-01-152024-01-15
Git, Github

Last Updated on 2024-01-15 by Clay

Introduction

Sending a Pull Request (PR) on GitHub to an open-source project is a wonderful yet significant endeavor.

Clay
2024-01-092024-01-15
Python, PyTorch

Last Updated on 2024-01-15 by Clay

Problem

Yesterday, I developed a model merging program. This time I have no enough gpu memory to merge the models in only one time, so I tried to merge layer by layer. I found the memory of GPU is easily to release but CPU didn’t.

Clay
2024-01-082024-01-08
Linux

Last Updated on 2024-01-08 by Clay

Problem

/bin/bash: warning: shell level (1000) too high, resetting to 1

Clay
2024-01-072024-01-07
Python

Last Updated on 2024-01-07 by Clay

Introduction

Today I was reading the training source code of DreamBooth, and I found the tempfile built-in module; I happened to reconstruct a script for model’s layer merging, suddenly thought that the module would make the code more elegant, so I made this note.

Clay
2024-01-022024-01-02
Machine Learning, Python, PyTorch

Last Updated on 2024-01-02 by Clay

Introduction

DPO (Direct Preference Optimization) is a fine-tuning method that want to replace RLHF (Reinforcement Learning from Human Feedback).

Clay
2024-01-012024-01-02
Machine Learning, PyTorch

Last Updated on 2024-01-02 by Clay

Problem

HuggingFace has published an article stating that the current LLM is best trained according to the ChatML format. In normal case, it will be generated according to three different roles of system, user and assistant. The format is as follows:

January 2024

[Paper Reading] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Introduction

[Paper Reading] QLoRA: Efficient Finetuning of Quantized LLMs

Introduction

[Paper Reading] The Forward-Forward Algorithm: Some Preliminary Investigation

Introduction

[GitHub] How To Create A Pull Request (PR)

Introduction

[PyTorch] Release GPU / CPU Memory After Delete Model

Problem

[Solved][Linux] /bin/bash: warning: shell level (1000) too high, resetting to 1

Problem

[Python] Creating and Auto-Removing Temporary Directories with Python’s `tempfile`

Introduction

Direct Preference Optimization (DPO) Training Note

Introduction

[Solved] Mistral Cannot Generate eos_token `<|im_end|>` After SFTTrainer Fine-tuning

Problem