LLM Fine-tuning Note - Differences Between SFT and DPO

Last Updated on 2024-08-02 by Clay Introduction In the fine-tuning tasks of Large Language Models (LLM), several methods such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) are all viable approaches. However, there are some differences among them. In the classic PPO training method, LLM training is divided … Continue reading LLM Fine-tuning Note – Differences Between SFT and DPO