What Is AutoEncoder ?
AutoEncoder is often referred to as AE for short. It is a neural network for unsupervised learning, in other words, it does not require labaled data.
So what is the purpose of AutoEncoder?
AutoEncoder actually has a huge family, with quite a few variants, suitable for all kinds of tasks. But if you want to briefly describe what AutoEncoder is doing, I think it can be drawn as the following picture.
The AutoEncoder architecture is divided into two parts: Encoder and Decoder. First put the "input" into the Encoder, which is compressed into a "low-dimensional" code by the neural network in the encoder architecture, which is the code in the picture, and then the code is input into the Decoder and decoded out the final "output".
So we have to control our loss function to make our inputs and outputs look like the better.
The most intuitive understanding is that it can de-noise and dimension reduction.
When our input is encoded into a low-dimensional CODE by the Encoder, if we can re-decode with the CODE to produce an output that is very similar to the input, we may be able to think that the CODE that we encoded in our Encoder represents the low-dimensional feature of the entire input.
In this way, we may be able to use the compressed CODE for subsequent deep learning processing to recude the computational cost.
After all, if the features of high-dimensional data are similar to those of low-dimensional data, then it must be calculated slower than low-dimension data.
In addition, low dimensions are also more suitable for visualization.
Not only that, if we can control the code of the intermediate code, even we can use the CODE to generate fake data.
So what does de-noise mean?
If we say that we encode the input into low-dimensional CODE, and then we can decode from the CODE and restore it to an output of the same dimensionality as the input, and "the more similar the input and output the better", maybe we can understand that CODE really learned some important features of the input and discarded the unimportant features.
I think this is the principle of de-noise. For more details, you may refer to DAE (Denoising AutoEncoder).
So below, I try to use PyTorch to build a simple AutoEncoder model. The input data is the classic Mnist. The purpose is to produce a picture that looks more like the input, and can be visualized by the code after the intermediate compression and dimensionality reduction.
AutoEncoder Built by PyTorch
I explain step by step how I build a AutoEncoder model in below.
First, we import all the packages we need.
# coding: utf-8 import torch import torch.nn as nn import torch.utils.data as data import torchvision
Then we set the arguments, such as epochs
, batch_size
, learning_rate
, and load the Mnist data set from torchvision
.
# Settings epochs = 10 batch_size = 128 lr = 0.008 # DataLoader train_set = torchvision.datasets.MNIST( root='mnist', train=True, download=True, transform=torchvision.transforms.ToTensor(), ) train_loader = data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
Define the model architecture of AutoEncoder. As mentioned above, it is divided into two parts: Encoder and Decoder.
# Model structure class AutoEncoder(nn.Module): def __init__(self): super(AutoEncoder, self).__init__() # Encoder self.encoder = nn.Sequential( nn.Linear(784, 128), nn.Tanh(), nn.Linear(128, 64), nn.Tanh(), nn.Linear(64, 16), nn.Tanh(), nn.Linear(16, 2), ) # Decoder self.decoder = nn.Sequential( nn.Linear(2, 16), nn.Tanh(), nn.Linear(16, 64), nn.Tanh(), nn.Linear(64, 128), nn.Tanh(), nn.Linear(128, 784), nn.Sigmoid() ) def forward(self, inputs): codes = self.encoder(inputs) decoded = self.decoder(codes) return codes, decoded
Establish the model and determine the optimizer and loss function.
# Optimizer and loss function model = AutoEncoder() optimizer = torch.optim.Adam(model.parameters(), lr=lr) loss_function = nn.MSELoss()
Then the training began.
# Train for epoch in range(epochs): for data, labels in train_loader: inputs = data.view(-1, 784) # Forward codes, decoded = model(inputs) # Backward optimizer.zero_grad() loss = loss_function(decoded, inputs) loss.backward() optimizer.step() # Show progress print('[{}/{}] Loss:'.format(epoch+1, epochs), loss.item()) # Save torch.save(model, 'autoencoder.pth')
Output:
[1/10] Loss: 0.04639464616775513
[2/10] Loss: 0.04818795993924141
[3/10] Loss: 0.038940753787755966
[4/10] Loss: 0.039030447602272034
[5/10] Loss: 0.041724737733602524
[6/10] Loss: 0.03994645178318024
[7/10] Loss: 0.03632541000843048
[8/10] Loss: 0.041585564613342285
[9/10] Loss: 0.036579448729753494
[10/10] Loss: 0.04153323173522949
Finally, we formally start training and save the model.
The following is the complete training code:
# coding: utf-8 import torch import torch.nn as nn import torch.utils.data as data import torchvision # Settings epochs = 10 batch_size = 128 lr = 0.008 # DataLoader train_set = torchvision.datasets.MNIST( root='mnist', train=True, download=True, transform=torchvision.transforms.ToTensor(), ) train_loader = data.DataLoader(train_set, batch_size=batch_size, shuffle=True) # Model structure class AutoEncoder(nn.Module): def __init__(self): super(AutoEncoder, self).__init__() # Encoder self.encoder = nn.Sequential( nn.Linear(784, 128), nn.Tanh(), nn.Linear(128, 64), nn.Tanh(), nn.Linear(64, 16), nn.Tanh(), nn.Linear(16, 2), ) # Decoder self.decoder = nn.Sequential( nn.Linear(2, 16), nn.Tanh(), nn.Linear(16, 64), nn.Tanh(), nn.Linear(64, 128), nn.Tanh(), nn.Linear(128, 784), nn.Sigmoid() ) def forward(self, inputs): codes = self.encoder(inputs) decoded = self.decoder(codes) return codes, decoded # Optimizer and loss function model = AutoEncoder() optimizer = torch.optim.Adam(model.parameters(), lr=lr) loss_function = nn.MSELoss() # Train for epoch in range(epochs): for data, labels in train_loader: inputs = data.view(-1, 784) # Forward codes, decoded = model(inputs) # Backward optimizer.zero_grad() loss = loss_function(decoded, inputs) loss.backward() optimizer.step() # Show progress print('[{}/{}] Loss:'.format(epoch+1, epochs), loss.item()) # Save torch.save(model, 'autoencoder.pth')
Test Model Effect
Now that we have trained the AutoEncoder model just now, let's take a look at the picture we restored from the compressed CODE. Does it look like the original picture?
# coding: utf-8 import torch import torch.nn as nn import torch.utils.data as data import torchvision import numpy as np import matplotlib.pyplot as plt # Settings plt.rcParams['figure.figsize'] = (10.0, 8.0) plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray' # Show images def show_images(images): sqrtn = int(np.ceil(np.sqrt(images.shape[0]))) for index, image in enumerate(images): plt.subplot(sqrtn, sqrtn, index+1) plt.imshow(image.reshape(28, 28)) plt.axis('off') # Model structure class AutoEncoder(nn.Module): def __init__(self): super(AutoEncoder, self).__init__() # Encoder self.encoder = nn.Sequential( nn.Linear(784, 128), nn.Tanh(), nn.Linear(128, 64), nn.Tanh(), nn.Linear(64, 16), nn.Tanh(), nn.Linear(16, 2), ) # Decoder self.decoder = nn.Sequential( nn.Linear(2, 16), nn.Tanh(), nn.Linear(16, 64), nn.Tanh(), nn.Linear(64, 128), nn.Tanh(), nn.Linear(128, 784), nn.Sigmoid() ) def forward(self, inputs): codes = self.encoder(inputs) decoded = self.decoder(codes) return codes, decoded # Load model model = torch.load('autoencoder.pth') model.eval() print(model) # DataLoader test_set = torchvision.datasets.MNIST( root='mnist', train=False, download=True, transform=torchvision.transforms.ToTensor(), ) test_loader = data.DataLoader(test_set, batch_size=16, shuffle=False) # Test with torch.no_grad(): for data in test_loader: inputs = data[0].view(-1, 28*28) show_images(inputs) plt.show() code, outputs = model(inputs) show_images(outputs) plt.show() exit()
Output:
As you can see, the pictures produced by AutoEncoder have some models, but they still extract the features of the input well.
Visualize With Compress CODE
Our compressed CODE can be easily used by visualization:
# coding: utf-8 import torch import torch.nn as nn import torch.utils.data as data import torchvision import matplotlib.pyplot as plt # Model structure class AutoEncoder(nn.Module): def __init__(self): super(AutoEncoder, self).__init__() # Encoder self.encoder = nn.Sequential( nn.Linear(784, 128), nn.Tanh(), nn.Linear(128, 64), nn.Tanh(), nn.Linear(64, 16), nn.Tanh(), nn.Linear(16, 2), ) # Decoder self.decoder = nn.Sequential( nn.Linear(2, 16), nn.Tanh(), nn.Linear(16, 64), nn.Tanh(), nn.Linear(64, 128), nn.Tanh(), nn.Linear(128, 784), nn.Sigmoid() ) def forward(self, inputs): codes = self.encoder(inputs) decoded = self.decoder(codes) return codes, decoded # Load model model = torch.load('autoencoder.pth') model.eval() print(model) # DataLoader test_set = torchvision.datasets.MNIST( root='mnist', train=False, download=True, transform=torchvision.transforms.ToTensor(), ) test_loader = data.DataLoader(test_set, batch_size=16, shuffle=False) axis_x = [] axis_y = [] answers = [] with torch.no_grad(): for data in test_loader: inputs = data[0].view(-1, 28*28) answers += data[1].tolist() code, outputs = model(inputs) axis_x += code[:, 0].tolist() axis_y += code[:, 1].tolist() plt.scatter(axis_x, axis_y, c=answers) plt.colorbar() plt.show()
Output:
It can be seen that, in fact, the CODE compressed by AutoEncoder has been able to initially grasp the existence of different features in each different picture. This is really interesting.
References
- https://medium.com/pytorch/implementing-an-autoencoder-in-pytorch-19baa22647d1
- https://www.kaggle.com/jagadeeshkotra/autoencoders-with-pytorch
- https://www.kaggle.com/ljlbarbosa/convolution-autoencoder-pytorch
- https://discuss.pytorch.org/t/autoencoders-in-pytorch/844