[Machine Learning] Note of LayerNorm

Last Updated on 2024-08-19 by Clay The working principle of LayerNorm is as follows: ϵ is a small number added to avoid division by zero. Finally, a learnable scale parameter γ and shift parameter β (both learned through training, initialized as γ = 1 and β = 0) are used to perform a linear transformation on each input element. The reason … Continue reading [Machine Learning] Note of LayerNorm