[Machine Learning] Note of LayerNorm
Last Updated on 2024-08-19 by Clay The working principle of LayerNorm is as follows: ϵ is a small number added to avoid division by zero. Finally, a learnable scale parameter γ and shift parameter β (both learned through training, initialized as γ = 1 and β = 0) are used to perform a linear transformation on each input element. The reason … Continue reading [Machine Learning] Note of LayerNorm
Copy and paste this URL into your WordPress site to embed
Copy and paste this code into your site to embed