LLM Fine-tuning Note – Differences Between SFT and DPO
Introduction
In the fine-tuning tasks of Large Language Models (LLM), several methods such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), and Direct Preference Optimization (DPO) are all viable approaches. However, there are some differences among them.
Read More »LLM Fine-tuning Note – Differences Between SFT and DPO