Clay
LLM 微調筆記 - SFT 和 DPO 的差異
介紹
在大型語言模型(Large Language Model, LLM)的微調任務中,監督式微調(Supervised Fine-tuning, SFT)、基於人類反饋強化學習(Reinforcement Learning from Human Feedback, RLHF)和直接偏好優化(DPO)... 等等都是不錯的方法,不過他們之間存在一些差異。
Read More »LLM 微調筆記 - SFT 和 DPO 的差異Direct Preference Optimization (DPO) 訓練方法筆記
介紹
DPO(Direct Preference Optimization, 直接偏好優化)是一種取代 RLHF(Reinforcement Learning from Human Feedback, 基於人類反饋的強化學習)的微調方式。眾所皆知,大型語言模型在經過非監督式學習後能夠學習到大量的知識與理解能力(有些研究者認為是『壓縮並保存』了知識在神經網路權重中);在監督式學習後學會了流暢地回應我們的問題,或者說是學會了『對話』的能力。
Read More »Direct Preference Optimization (DPO) 訓練方法筆記[已解決] fatal error: portaudio.h: No such file or directory 9 | #include "portaudio.h" | ^~~~~~~~~~~~~ compilation terminated
問題描述
今天當我在一台新的 Linux 筆電上想要安裝 pyaudio(Python 中經常用於錄音的套件)時,我遇到了之前沒有遇過的錯誤:
Read More »[已解決] fatal error: portaudio.h: No such file or directory 9 | #include "portaudio.h" | ^~~~~~~~~~~~~ compilation terminatedLeetCode: 1637 Widest Vetical Area Between Two Points Containing No Points 解題紀錄
題目
Given n
points
on a 2D plane where points[i] = [xi, yi]
, Return the widest vertical area between two points such that no points are inside the area.
LeetCode: 2706-Buy Two Chocolates 解題紀錄
題目
You are given an integer array prices
representing the prices of various chocolates in a store. You are also given a single integer money
, which represents your initial amount of money.
LeetCode: 661-Image Smoother 解題紀錄
題目
An image smoother is a filter of the size 3 x 3
that can be applied to each cell of an image by rounding down the average of the cell and the eight surrounding cells (i.e., the average of the nine cells in the blue smoother). If one or more of the surrounding cells of a cell is not present, we do not consider it in the average (i.e., the average of the four cells in the red smoother).
[Python] IPython 互動界面中換行的方法簡記
前言
IPython 是一個提供互動式運算的系統,可以在各種 shell 跟視覺化界面整合在一起;比方說我們可以透過終端機使用 ipython
(前題是這個模組有裝)、或是使用如 VS Code 和 PyCharm 這類的圖形化界面編輯器/IDE。
LeetCode: 2352-Design a Food Rating System 解題紀錄
題目
Design a food rating system that can do the following:
- Modify the rating of a food item listed in the system.
- Return the highest-rated food item for a type of cuisine in the system.