[Machine Learning] SiLU 激活函數筆記

Last Updated on 2024-06-06 by Clay

介紹

SiLU （Sigmoid Linear Unit）激活函數，跟 Swish 函數十分相像，只差在一個可訓練的 beta 係數，現今也有許多大型語言模型（LLM）採用，主要是在一些探索使用非 ReLU 等激活函數的大型語言模型上，比方說經典的 Llama 架構。

在這裡羅列 SiLU 和 Swish 的公式，兩者差異十分明顯：

這裡順便羅列 Sigmoid 函數的公式：

SiLU 函數在輸入越大的 x 值會越趨近於 x 、輸入越小的值時則會趨近於 0。直觀上來說，SiLU 的曲線非常平滑，函數的輸出隨著輸入變化且連續有導數，在利用梯度下降演算法時會很有效，導數變化不會太突兀。

程式實作

import torch
import matplotlib.pyplot as plt

# Define the SiLU activation function
def my_silu(x: torch.Tensor) -> torch.Tensor:
    return x * torch.sigmoid(x)

# Generate an input tensor from -10 to 10 with 100 points
x_values = torch.linspace(-10, 10, 100)

# Apply the SiLU function to the input tensor
y_values = my_silu(x_values)

# Plotting the function
plt.figure(figsize=(10, 5))
plt.plot(x_values.numpy(), y_values.numpy(), label="SiLU Activation")
plt.title("SiLU Activation Function")
plt.xlabel("Input value (x)")
plt.ylabel("Activated value (y)")
plt.legend()
plt.grid(True)
plt.show()

Output:

[Machine Learning] SiLU 激活函數筆記

介紹

程式實作

References

Read More

Leave a Reply取消回覆

[Machine Learning] SiLU 激活函數筆記

介紹

程式實作

References

Read More

分享此文：

Leave a Reply取消回覆