[Machine Learning] Note Of SiLU Activation Function
Last Updated on 2024-06-06 by Clay Introduction SiLU (Sigmoid Linear Unit) activation function is similar to Swish function, Swish just have additional trainable beta parameter. Many large language model (LLM) also adopt this approach, primarily in some exploratory models that use activation functions other than ReLU, such as the classic Llama architecture. I show the … Continue reading [Machine Learning] Note Of SiLU Activation Function
Copy and paste this URL into your WordPress site to embed
Copy and paste this code into your site to embed