Skip to content

[Machine Learning] Introduction of ReLU

Last Updated on 2021-06-03 by Clay

What is ReLU

Rectified Linear Unit (ReLU), is a famous activation function in neural network layer, it is believed to have sine degree if biological principle, although I don't know what it is. =)

Let's take a look for ReLU formula:

To verify the formula, I wrote a small Python program to draw a picture.

# -*- coding: utf-8 -*-
import math
import matplotlib.pyplot as plt

x = []
dx = -20
while dx <= 20:
    x.append(dx)
    dx += 0.1


def ReLU(x):
    if x < 0: return 0
    else: return x


px = [xv for xv in x]
py = [ReLU(xv) for xv in x]


plt.plot(px, py)
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
plt.show()



Output:

We can set any range of x input. and we can see, the output is zero when we input x < 0.


Leaky ReLU

Leaky ReLU function is a variant of ReLU.

If the ReLU function sets all negative values to 0, then Leaky ReLU multiplies the negative values by a slope greater than 0.

Formula:

The following I wrote a small program again, a is assign to 0.07.

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt

a = 0.07
x = []
dx = -20


while dx <= 20:
    x.append(dx)
    dx += 0.1


def LeakyReLU(x):
    if x < 0: return a*x
    else: return x


px = [xv for xv in x]
py = [LeakyReLU(xv) for xv in x]


plt.plot(px, py)
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
plt.show()



Output:

It's different from the original ReLU function.


Application

  • You can call the ReLU function easily when you implemented by Keras or PyTorch
  • Very fast calculations due to linearity
  • Fast convergence
  • When the input is negative, if the learning rate is too large, maybe some error happen.

Reference

Paper: https://arxiv.org/pdf/1811.03378.pdf


Read More

Leave a Reply