[Machine Learning] Note of RMSNorm

Last Updated on 2024-08-17 by Clay Introduction to RMSNorm RMSNorm is an improvement over LayerNorm, often used in the Transformer self-attention mechanism. It aims to mitigate the issues of vanishing and exploding gradients, helping the model converge faster and improve performance. In the original LayerNorm, the input elements are first normalized by calculating the mean … Continue reading [Machine Learning] Note of RMSNorm