[Machine Learning] Note Of SiLU Activation Function

Last Updated on 2024-06-06 by Clay Introduction SiLU (Sigmoid Linear Unit) activation function is similar to Swish function, Swish just have additional trainable beta parameter. Many large language model (LLM) also adopt this approach, primarily in some exploratory models that use activation functions other than ReLU, such as the classic Llama architecture. I show the … Continue reading [Machine Learning] Note Of SiLU Activation Function