Blog

Supervised Fine-tuning Trainer (SFTTrainer) 訓練筆記

Clay
2024-01-032024-01-03
2 Comments
Machine Learning, PyTorch

[已解決] Mistral 經過 SFTTrainer 微調後不會輸出 eos_token `<|im_end|>`

Clay
2023-12-312024-02-20
Machine Learning, PyTorch

問題描述

HuggingFace 之前曾經發表過文章表示現在的 LLM最好是依照 ChatML 格式去訓練，在一般情況下，會按照 system、user、assistant 的三種不同角色來進行生成，格式如下：

[已解決][Linux] /bin/bash: warning: shell level (1000) too high, resetting to 1

Clay
2023-12-302023-12-30
Linux

問題描述

/bin/bash: warning: shell level (1000) too high, resetting to 1

[C++] 程式競賽中的 ios_base::sync_with_stdio(false) 和 cin.tie(NULL) 的意義

Clay
2023-12-292023-12-29
C++

LLM 微調筆記 – SFT 和 DPO 的差異

Clay
2023-12-272023-12-27
Machine Learning

介紹

在大型語言模型（Large Language Model, LLM）的微調任務中，監督式微調（Supervised Fine-tuning, SFT）、基於人類反饋強化學習（Reinforcement Learning from Human Feedback, RLHF）和直接偏好優化（DPO）… 等等都是不錯的方法，不過他們之間存在一些差異。

Direct Preference Optimization (DPO) 訓練方法筆記

Clay
2023-12-262024-02-29
Machine Learning, Python, PyTorch

介紹

DPO（Direct Preference Optimization, 直接偏好優化）是一種取代 RLHF（Reinforcement Learning from Human Feedback, 基於人類反饋的強化學習）的微調方式。眾所皆知，大型語言模型在經過非監督式學習後能夠學習到大量的知識與理解能力（有些研究者認為是『壓縮並保存』了知識在神經網路權重中）；在監督式學習後學會了流暢地回應我們的問題，或者說是學會了『對話』的能力。

[已解決] fatal error: portaudio.h: No such file or directory 9 | #include “portaudio.h” | ^~~~~~~~~~~~~ compilation terminated

Clay
2023-12-242023-12-24
Linux, Python

問題描述

今天當我在一台新的 Linux 筆電上想要安裝 pyaudio（Python 中經常用於錄音的套件）時，我遇到了之前沒有遇過的錯誤：

LeetCode: 1637 Widest Vetical Area Between Two Points Containing No Points 解題紀錄

Clay
2023-12-212023-12-21
C++, LeetCode, Python

題目

Given n points on a 2D plane where points[i] = [x_i, y_i], Return the widest vertical area between two points such that no points are inside the area.

LeetCode: 2706-Buy Two Chocolates 解題紀錄

Clay
2023-12-202023-12-20
C++, LeetCode, Python

題目

You are given an integer array prices representing the prices of various chocolates in a store. You are also given a single integer money, which represents your initial amount of money.

LeetCode: 661-Image Smoother 解題紀錄

Clay
2023-12-192023-12-19
C++, LeetCode, Python

題目

An image smoother is a filter of the size 3 x 3 that can be applied to each cell of an image by rounding down the average of the cell and the eight surrounding cells (i.e., the average of the nine cells in the blue smoother). If one or more of the surrounding cells of a cell is not present, we do not consider it in the average (i.e., the average of the four cells in the red smoother).

« 上一頁
1
...
9
10
11
12
13
...
108
下一頁 »

2025 年 6 月
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30