Skip to content

August 2024

LLM Fine-tuning Note – Differences Between SFT and DPO

Last Updated on 2024-08-02 by Clay

Introduction

In the fine-tuning tasks of Large Language Models (LLM), several methods such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) are all viable approaches. However, there are some differences among them.

Read More »LLM Fine-tuning Note – Differences Between SFT and DPO