Skip to content

[PyTorch] Tutorial(3) Introduction of Neural Networks

The so-called Neural Network is the model architecture we want to build for deep learning. In official PyTorch document, the first sentence clearly states:

You can use torch.nn to build a neural network.

nn contains the model layer and a forward() function, and will return output. This can be clearly seen in the code that follows.


First, let’s explain the basic training process of a neural network:

  1. Define the neural network and set the learning parameters or weights
  2. Iterative input training data
  3. The configured neural network starts processing the input data
  4. Calculate the loss, witch is to calculate the “gap between output and the correct answer”
  5. Get the weight that should be changed by loss function
  6. To update the weights of the neural network, simpler rules are usually used, such as: new weight = old weight – (learning rate * gradient)

So, let’s take a look at the official sample code below.

Before that, if you want to refer the tutorial, link here: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py

I also follow their teaching to learn.


Define the neural network

The following is a very basic CNN program. Basically, I have checked all kinds of model designs on the Internet, so don’t be funny and go to the official code:

# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # Affine operation
        self.fc1 = nn.Linear(16*6*6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
       
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


if __name__ == '__main__':
    net = Net()
    print(net)


Output:

Net(
   (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
   (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
   (fc1): Linear(in_features=576, out_features=120, bias=True)
   (fc2): Linear(in_features=120, out_features=84, bias=True)
   (fc3): Linear(in_features=84, out_features=10, bias=True)
)

As mentioned in the previous note, the neural network has two parts: forward propagation and backward propagation. In PyTorch, after we inherit nn.Module, we need to define the forward() function ourselves; as far backward propagation, just as before as mentioned in [PyTorch] Tutorial(2) Automatic derivative, you can use autograd to get the gradient.

params = list(net.parameters())
print(len(params))

for n in range(len(params)):
    print(params[n].size())


Output:

10
torch.Size([6, 1, 3, 3])
torch.Size([6])
torch.Size([16, 6, 3, 3])
torch.Size([16])
torch.Size([120, 576])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])

According to the official explanation, the above 10 are the size of the learnable parameters of model, Each parameter has a different size.

Next we are about to start generating a random input. In order the conform to the MNIST graphics (I have demo how to use Keras to build a classifier in [Keras] Use CNN to build a simple classifier to MNIST), we have to set the size to 32×32.

The familiar MNIST seems to be 28×28? Regarding why, I actually found it on StackOverflow: https://stackoverflow.com/questions/28525436/why-the-lenet5-uses-32%C3%9732-image-as-input

Not much nonsense, just look at the input code.

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)


Output:

tensor([[-0.0130, -0.0792, -0.1078, -0.1333,
          0.0815,  0.0615, -0.1001, -0.1203,
          0.0502, -0.0364]], grad_fn=<AddmmBackward>)

The value we get is the value returned after our input value is passed through forward(). Next we have to perform a very important action in PyTorch: Reset the parameters and gradient buffer to zero.

If we don’t do this, the gradients will accumulate every time, resulting in wrong training results.

The code is as follows:

net.zero_grad()
out.backward(torch.randn(1, 10))



Loss function

The loss function in PyTorch, it needs to be entered in the format (predicted result, correct answer). There are many ways to calculate the loss function, such as Mean Square Error (MSE), Mean Absolute Error (MAE), Cross Entropy … etc.

When we implement the model of the prediction task, we usually hope that the predictions of the model are all positive solutions, and then, usually the results of our predictions will be different from the correct answers of training data. The gap between the two is loss.

Today we first demo with the Mean Square Error (MSE) demo on the official website. We can find the function of this loss function under nn.

Since our input is determined by random numbers, let’s also use random numbers to determine a correct result, right? After all, our purpose is just to know how to calculate the loss function.

# Input & Target
input = torch.randn(1, 1, 32, 32)
out = net(input)
target = torch.randn(10).view(1, -1)
print(out.size())
print(target.size())


Output:

torch.Size([1, 10])
torch.Size([1, 10])

Don’t forget, we need to use view() to align the predicted result with the standard answer size. Here we can see that the dimensions of both are the same.

Then we can use the function provided by PyTorch to calculate loss:

# Loss function
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)


Output:

tensor(0.579, grad_fn=<MseLossBackward>)

The above is how we calculate loss.


Backward propagation

Backward propagation, also known as back propagation, is for us to update the weight of our model in the reverse direction by calculating the distance between the prediction result and the positive solution (via loss function) to make the model more and more accurate.

# Backprop
net.zero_grad()

# Before
print('Before:', net.conv1.bias.grad)

# After
loss.backward()
print('After:', net.conv1.bias.grad)


Output:

Before: tensor([0., 0., 0., 0., 0., 0.])
After: tensor([-0.0186,  0.0026,  0.0069, -0.0020,  0.0032, -0.0144])

Remember to use zero_grad() to clear the gradient of the buffer, otherwise it will gradually accumulate and cause incorrect results.


Update weight

Do you remember the weight update function in my notes at the beginning? Usually:

new weight = old weight – (learning rate * gradient)

This is called Stochastic Gradient Descent (SGD), and the official website gives a concise code to implement it:

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)


In addition, you can also try different update rules, such as using classic SGD, Adam, RMSProp… etc., all of which can be called in torch.optim.

I hope we can train a real model soon.


Read More

Leave a Reply