Skip to content

12 月 2023

Direct Preference Optimization (DPO) 訓練方法筆記

Last Updated on 2024-02-29 by Clay

介紹

DPODirect Preference Optimization, 直接偏好優化)是一種取代 RLHFReinforcement Learning from Human Feedback, 基於人類反饋的強化學習)的微調方式。眾所皆知,大型語言模型在經過非監督式學習後能夠學習到大量的知識與理解能力(有些研究者認為是『壓縮並保存』了知識在神經網路權重中);在監督式學習後學會了流暢地回應我們的問題,或者說是學會了『對話』的能力。

Read More »Direct Preference Optimization (DPO) 訓練方法筆記

[已解決] fatal error: portaudio.h: No such file or directory 9 | #include "portaudio.h" | ^~~~~~~~~~~~~ compilation terminated

Last Updated on 2023-12-24 by Clay

問題描述

今天當我在一台新的 Linux 筆電上想要安裝 pyaudio(Python 中經常用於錄音的套件)時,我遇到了之前沒有遇過的錯誤:

Read More »[已解決] fatal error: portaudio.h: No such file or directory 9 | #include "portaudio.h" | ^~~~~~~~~~~~~ compilation terminated

LeetCode: 661-Image Smoother 解題紀錄

Last Updated on 2023-12-19 by Clay

題目

An image smoother is a filter of the size 3 x 3 that can be applied to each cell of an image by rounding down the average of the cell and the eight surrounding cells (i.e., the average of the nine cells in the blue smoother). If one or more of the surrounding cells of a cell is not present, we do not consider it in the average (i.e., the average of the four cells in the red smoother).

Read More »LeetCode: 661-Image Smoother 解題紀錄