[Machine Learning] Note of LayerNorm
Last Updated on 2024-08-19 by Clay
The working principle of LayerNorm is as follows:
- Calculate mean and variance
Last Updated on 2024-08-19 by Clay
The working principle of LayerNorm is as follows:
Last Updated on 2024-08-18 by Clay
Cross entropy is a very common loss function in Machine Learning, as it is able to quantify the difference between a model's classification predictions and the actual class labels, particularly in 'classification tasks'.
Read More »[Machine Learning] Note of Cross Entropy LossLast Updated on 2024-08-18 by Clay
Gaussian Error Linear Unit (GELU) is an activation function used in machine learning. While it resembles the classic ReLU (Rectified Linear Unit), there are some key differences.
Read More »[Machine Learning] Note of Activation Function GELULast Updated on 2024-08-17 by Clay
RMSNorm is an improvement over LayerNorm, often used in the Transformer self-attention mechanism. It aims to mitigate the issues of vanishing and exploding gradients, helping the model converge faster and improve performance.
Read More »[Machine Learning] Note of RMSNormLast Updated on 2024-08-17 by Clay
(Note: Since this article is imported from my personal Hackmd, some symbols and formatting might not display properly in WordPress. I appreciate your understanding, sorry for any inconvenience.)
RoPE is a method for introducing relative position information into the self-attention mechanism through absolute positional encoding.
Read More »[Machine Learning] Note of Rotary Position Embedding (RoPE)Last Updated on 2024-08-16 by Clay
Scaled Dot-Product Attention (SDPA) might immediately pop into the minds of those familiar with the Transformer self-attention mechanism:
Last Updated on 2024-08-14 by Clay
I often use my computer to watch full-screen videos and need to promptly return to work mode when the time comes — but embarrassingly, I don't like constantly picking up my phone to check the current time, as it's quite inconvenient.
Read More »[Linux] tty-clock: A Tool For Displaying Electronic Clock Time On TerminalLast Updated on 2024-08-19 by Clay
Cross-lingual Modular (X-Mod) is an interesting language model architecture that modularizes the parameters for different languages as Module Units, allowing the model to use separate parameters when fine-tuning for a new language, thereby (comparatively) avoiding the problem of catastrophic forgetting.
Read More »[Paper Reading] Lifting the Curse of Multilinguality by Pre-training Modular TransformersLast Updated on 2024-08-12 by Clay
When using ComfyUI to generate images, we need to leverage the capabilities of various models to ultimately form a complete workflow. In other words, these so-called various models together constitute what we call Stable Diffusion. Today, I will introduce where to download these models.
Read More »Stable Diffusion ComfyUI Note 03 - How To Download SD ModelsLast Updated on 2024-08-11 by Clay
Singular Value Decomposition (SVD) is a method for decomposing a matrix into three matrices, revealing the rank, data dimensions, and key directions of the original matrix. It is often used in dimensionality reduction, compression, and structural analysis.
Read More »Note Of Singular Value Decomposition (SVD)