PyTorch

[Solved] RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

Clay
2024-08-212024-08-21
Machine Learning, PyTorch

Problem Description

When building deep learning models in PyTorch, adjusting the shapes of layers and input/output dimensions is something every AI engineer has to deal with. However, there is a small but interesting pitfall in the view() method of PyTorch:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

[Machine Learning] Note of LayerNorm

Clay
2024-08-192024-08-19
Machine Learning, PyTorch

The working principle of LayerNorm is as follows:

Calculate mean and variance

$mean = \mu =\frac{\sum_{i=1}^{N}x_i}{N} \newline variance = \sigma^2 = \frac{\sum_{i=1}^{N}(x_{i}-\mu)^2}{N}$ Read More »

[Machine Learning] Note of Cross Entropy Loss

Clay
2024-08-182024-08-18
Machine Learning, PyTorch

Introduction to Cross Entropy

Cross entropy is a very common loss function in Machine Learning, as it is able to quantify the difference between a model’s classification predictions and the actual class labels, particularly in ‘classification tasks’.

[Machine Learning] Note of Activation Function GELU

Clay
2024-08-182024-08-18
Machine Learning, PyTorch

Gaussian Error Linear Unit (GELU) is an activation function used in machine learning. While it resembles the classic ReLU (Rectified Linear Unit), there are some key differences.

[Machine Learning] Note of RMSNorm

Clay
2024-08-172024-08-17
Machine Learning, PyTorch

Introduction to RMSNorm

RMSNorm is an improvement over LayerNorm, often used in the Transformer self-attention mechanism. It aims to mitigate the issues of vanishing and exploding gradients, helping the model converge faster and improve performance.

[Machine Learning] Note of Rotary Position Embedding (RoPE)

Clay
2024-08-162024-08-17
Machine Learning, PyTorch

Introduction

(Note: Since this article is imported from my personal Hackmd, some symbols and formatting might not display properly in WordPress. I appreciate your understanding, sorry for any inconvenience.)

RoPE is a method for introducing relative position information into the self-attention mechanism through absolute positional encoding.

[PyTorch] Using SDPA in 2.0+ to Improve the Computation Speed of Transformer’s Self-Attention Mechanism

Clay
2024-08-152024-08-16
Machine Learning, PyTorch

SDPA Introduction

Scaled Dot-Product Attention (SDPA) might immediately pop into the minds of those familiar with the Transformer self-attention mechanism:

$Attention(Q,K,V)=softmax(\frac{QK^{T}}{\sqrt{d_{k}}})V$

[Paper Reading] RAGAS: Automated Evaluation of Retrieval Augmented Generation

Clay
2024-08-102024-08-10
Machine Learning, PyTorch

Introduction

The year 2023 witnessed an explosion of generative AI technologies, with a myriad of applications emerging across various domains. In the field of Natural Language Processing (NLP), Large Language Models (LLMs) stand out as one of the most significant advancements. By training LLMs effectively and reducing hallucinations, they can significantly reduce human effort across a wide range of tasks.

Supervised Fine-tuning Trainer (SFTTrainer) Note

Clay
2024-08-022024-08-02
Machine Learning, PyTorch

Introduction

Supervised Fine-Tuning (SFT) is one of the most well-known methods for training Large Language Models (LLM). Essentially, it is similar to traditional language modeling, where the model learns certain knowledge through training data.

[Machine Learning] Note Of SiLU Activation Function

Clay
2024-06-062024-06-06
Machine Learning, PyTorch

Introduction

SiLU (Sigmoid Linear Unit) activation function is similar to Swish function, Swish just have additional trainable beta parameter. Many large language model (LLM) also adopt this approach, primarily in some exploratory models that use activation functions other than ReLU, such as the classic Llama architecture.

« Previous
1
2
3
4
…
9
Next »