Direct Preference Optimization (DPO) Training Note
Last Updated on 2024-01-02 by Clay
Introduction
DPO (Direct Preference Optimization) is a fine-tuning method that want to replace RLHF (Reinforcement Learning from Human Feedback).
Read More »Direct Preference Optimization (DPO) Training Note