[Machine Learning] Note Of SiLU Activation Function
Introduction
SiLU (Sigmoid Linear Unit) activation function is similar to Swish function, Swish just have additional trainable beta parameter. Many large language model (LLM) also adopt this approach, primarily in some exploratory models that use activation functions other than ReLU, such as the classic Llama architecture.
Read More »[Machine Learning] Note Of SiLU Activation Function