Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)
I've been intermittently reading about a fine-tuning method called Kahneman-Tversky Optimization (KTO) from various sources like HuggingFace's official documents and other online materials. It's similar to DPO as a way to align models with human values, but KTO's data preparation format is much more convenient, so I'm quickly applying it to my current tasks before making time to study the detailed content in the related papers.
Read More »Note Of KTOTrainer (Kahneman-Tversky Optimization Trainer)