Skip to content

[PyTorch] Tutorial(2) Automatic derivative

I believe everyone who is familiar with deep learning knows the importance of automatic derivation. In here, I will make a simple record.

The neural network operation of the deep learning model includes forward propagation and backward propagation.

In PyTorch, forward propagation needs to be set by the user himself, so that it can be used to construct our model layer. Comparing the standard solution of training data at the end of the calculation, we will get the value of their difference, so-called loss.

The backward propagation is to use the gradient descent method to update the weight of our model by deriving our loss function.

In this way, it must be self-evident how important it is to make our model “automatic derivative“? Then let’s take a look at how to do it in PyTorch.

My notes are basically not far from the official PyTorch tutorial. It should be said that I followed their official website to learn. I recommend it here: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py


Autograd

First, create a simple 2×2 matrix with only 1 and set requires_grad=True.

import torch

a = torch.ones(2, 2, requires_grad=True)
print('a:\n', a)


Output:

a:
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Next, we assign a+5 to the variable b.

b = a + 5
print('b:\n', b)
print('b grad:\n', b.grad_fn)


Output:

b:
tensor([[6., 6.],
        [6., 6.]], grad_fn=<AddBackward0>)

b grad:
<AddBackward0 object at 0x7f7ae5fde80>


Then we continue to the next level of transmission: assign b*b*2 to the variable c:

c = b*b*2
out = c.mean()
print('c:\n', c)
print('c out:\n', out)


Output:

c:
tensor([[72., 72.],
        [72., 72.]], grad_fn=<MulBackward0>)

c out:
tensor(72., grad_fn=<MeanBackward0>)

mean() is the average function, here we can see whether all 4 values are the average 72 of 72.

Then finally you can ask for guidance. Assuming that the variable out is our final output, and out contains only one value:

print('a grad:\n', a.grad)
out.backward()
print('a grad:\n', a.grad)


Output:

a grad:
None

a grad:
tensor([[6., 6.],
        [6., 6.]])

Since our out has only one value, we can write:

out.backward()


This command is equivalent to:

out.backward(torch.tensor(1.))


In the end, why do we get an answer like a grad =[[6., 6.], [6., 6.]] ?

Since the value in each variable is fixed, we can simplify the problem:

a = 1
b = a + 5
c = 2b^2

=> 1/4 * c’ = 4b
=> c’ = b
=> c’ = a + 5
=> c’ = 1 + 5 = 6

This is basically a relatively simple derivation, but it is more complicated. Let’s leave it to the tool to calculate.


Read More

Leave a Reply