Skip to content

[Machine Learning] Introduction To AutoEncoder (With PyTorch Code)

What Is AutoEncoder ?

AutoEncoder is often referred to as AE for short. It is a neural network for unsupervised learning, in other words, it does not require labaled data.

So what is the purpose of AutoEncoder?

AutoEncoder actually has a huge family, with quite a few variants, suitable for all kinds of tasks. But if you want to briefly describe what AutoEncoder is doing, I think it can be drawn as the following picture.

AutoEncoder

The AutoEncoder architecture is divided into two parts: Encoder and Decoder. First put the "input" into the Encoder, which is compressed into a "low-dimensional" code by the neural network in the encoder architecture, which is the code in the picture, and then the code is input into the Decoder and decoded out the final "output".

So we have to control our loss function to make our inputs and outputs look like the better.

The most intuitive understanding is that it can de-noise and dimension reduction.

When our input is encoded into a low-dimensional CODE by the Encoder, if we can re-decode with the CODE to produce an output that is very similar to the input, we may be able to think that the CODE that we encoded in our Encoder represents the low-dimensional feature of the entire input.

In this way, we may be able to use the compressed CODE for subsequent deep learning processing to recude the computational cost.

After all, if the features of high-dimensional data are similar to those of low-dimensional data, then it must be calculated slower than low-dimension data.

In addition, low dimensions are also more suitable for visualization.

Not only that, if we can control the code of the intermediate code, even we can use the CODE to generate fake data.

So what does de-noise mean?

If we say that we encode the input into low-dimensional CODE, and then we can decode from the CODE and restore it to an output of the same dimensionality as the input, and "the more similar the input and output the better", maybe we can understand that CODE really learned some important features of the input and discarded the unimportant features.

I think this is the principle of de-noise. For more details, you may refer to DAE (Denoising AutoEncoder).

So below, I try to use PyTorch to build a simple AutoEncoder model. The input data is the classic Mnist. The purpose is to produce a picture that looks more like the input, and can be visualized by the code after the intermediate compression and dimensionality reduction.


AutoEncoder Built by PyTorch

I explain step by step how I build a AutoEncoder model in below.

First, we import all the packages we need.

# coding: utf-8
import torch
import torch.nn as nn
import torch.utils.data as data
import torchvision


Then we set the arguments, such as epochs, batch_size, learning_rate, and load the Mnist data set from torchvision.

# Settings
epochs = 10
batch_size = 128
lr = 0.008


# DataLoader
train_set = torchvision.datasets.MNIST(
    root='mnist',
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
train_loader = data.DataLoader(train_set, batch_size=batch_size, shuffle=True)


Define the model architecture of AutoEncoder. As mentioned above, it is divided into two parts: Encoder and Decoder.

# Model structure
class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()

        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(784, 128),
            nn.Tanh(),
            nn.Linear(128, 64),
            nn.Tanh(),
            nn.Linear(64, 16),
            nn.Tanh(),
            nn.Linear(16, 2),
        )

        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(2, 16),
            nn.Tanh(),
            nn.Linear(16, 64),
            nn.Tanh(),
            nn.Linear(64, 128),
            nn.Tanh(),
            nn.Linear(128, 784),
            nn.Sigmoid()
        )

    def forward(self, inputs):
        codes = self.encoder(inputs)
        decoded = self.decoder(codes)

        return codes, decoded


Establish the model and determine the optimizer and loss function.

# Optimizer and loss function
model = AutoEncoder()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
loss_function = nn.MSELoss()


Then the training began.

# Train
for epoch in range(epochs):
    for data, labels in train_loader:
        inputs = data.view(-1, 784)

        # Forward
        codes, decoded = model(inputs)

        # Backward
        optimizer.zero_grad()
        loss = loss_function(decoded, inputs)
        loss.backward()
        optimizer.step()

    # Show progress
    print('[{}/{}] Loss:'.format(epoch+1, epochs), loss.item())


# Save
torch.save(model, 'autoencoder.pth')



Output:

[1/10] Loss: 0.04639464616775513
[2/10] Loss: 0.04818795993924141
[3/10] Loss: 0.038940753787755966
[4/10] Loss: 0.039030447602272034
[5/10] Loss: 0.041724737733602524
[6/10] Loss: 0.03994645178318024
[7/10] Loss: 0.03632541000843048
[8/10] Loss: 0.041585564613342285
[9/10] Loss: 0.036579448729753494
[10/10] Loss: 0.04153323173522949

Finally, we formally start training and save the model.

The following is the complete training code:

# coding: utf-8
import torch
import torch.nn as nn
import torch.utils.data as data
import torchvision


# Settings
epochs = 10
batch_size = 128
lr = 0.008


# DataLoader
train_set = torchvision.datasets.MNIST(
    root='mnist',
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
train_loader = data.DataLoader(train_set, batch_size=batch_size, shuffle=True)


# Model structure
class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()

        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(784, 128),
            nn.Tanh(),
            nn.Linear(128, 64),
            nn.Tanh(),
            nn.Linear(64, 16),
            nn.Tanh(),
            nn.Linear(16, 2),
        )

        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(2, 16),
            nn.Tanh(),
            nn.Linear(16, 64),
            nn.Tanh(),
            nn.Linear(64, 128),
            nn.Tanh(),
            nn.Linear(128, 784),
            nn.Sigmoid()
        )

    def forward(self, inputs):
        codes = self.encoder(inputs)
        decoded = self.decoder(codes)

        return codes, decoded


# Optimizer and loss function
model = AutoEncoder()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
loss_function = nn.MSELoss()


# Train
for epoch in range(epochs):
    for data, labels in train_loader:
        inputs = data.view(-1, 784)

        # Forward
        codes, decoded = model(inputs)

        # Backward
        optimizer.zero_grad()
        loss = loss_function(decoded, inputs)
        loss.backward()
        optimizer.step()

    # Show progress
    print('[{}/{}] Loss:'.format(epoch+1, epochs), loss.item())


# Save
torch.save(model, 'autoencoder.pth')



Test Model Effect

Now that we have trained the AutoEncoder model just now, let's take a look at the picture we restored from the compressed CODE. Does it look like the original picture?

# coding: utf-8
import torch
import torch.nn as nn
import torch.utils.data as data
import torchvision
import numpy as np
import matplotlib.pyplot as plt


# Settings
plt.rcParams['figure.figsize'] = (10.0, 8.0)
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'


# Show images
def show_images(images):
    sqrtn = int(np.ceil(np.sqrt(images.shape[0])))

    for index, image in enumerate(images):
        plt.subplot(sqrtn, sqrtn, index+1)
        plt.imshow(image.reshape(28, 28))
        plt.axis('off')


# Model structure
class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()

        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(784, 128),
            nn.Tanh(),
            nn.Linear(128, 64),
            nn.Tanh(),
            nn.Linear(64, 16),
            nn.Tanh(),
            nn.Linear(16, 2),
        )

        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(2, 16),
            nn.Tanh(),
            nn.Linear(16, 64),
            nn.Tanh(),
            nn.Linear(64, 128),
            nn.Tanh(),
            nn.Linear(128, 784),
            nn.Sigmoid()
        )

    def forward(self, inputs):
        codes = self.encoder(inputs)
        decoded = self.decoder(codes)

        return codes, decoded


# Load model
model = torch.load('autoencoder.pth')
model.eval()
print(model)


# DataLoader
test_set = torchvision.datasets.MNIST(
    root='mnist',
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
test_loader = data.DataLoader(test_set, batch_size=16, shuffle=False)


# Test
with torch.no_grad():
    for data in test_loader:
        inputs = data[0].view(-1, 28*28)
        show_images(inputs)
        plt.show()

        code, outputs = model(inputs)
        show_images(outputs)
        plt.show()
        exit()



Output:

Original Image
Picture produced by AutoEncoder

As you can see, the pictures produced by AutoEncoder have some models, but they still extract the features of the input well.


Visualize With Compress CODE

Our compressed CODE can be easily used by visualization:

# coding: utf-8
import torch
import torch.nn as nn
import torch.utils.data as data
import torchvision
import matplotlib.pyplot as plt


# Model structure
class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()

        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(784, 128),
            nn.Tanh(),
            nn.Linear(128, 64),
            nn.Tanh(),
            nn.Linear(64, 16),
            nn.Tanh(),
            nn.Linear(16, 2),
        )

        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(2, 16),
            nn.Tanh(),
            nn.Linear(16, 64),
            nn.Tanh(),
            nn.Linear(64, 128),
            nn.Tanh(),
            nn.Linear(128, 784),
            nn.Sigmoid()
        )

    def forward(self, inputs):
        codes = self.encoder(inputs)
        decoded = self.decoder(codes)

        return codes, decoded


# Load model
model = torch.load('autoencoder.pth')
model.eval()
print(model)


# DataLoader
test_set = torchvision.datasets.MNIST(
    root='mnist',
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)
test_loader = data.DataLoader(test_set, batch_size=16, shuffle=False)


axis_x = []
axis_y = []
answers = []
with torch.no_grad():
    for data in test_loader:
        inputs = data[0].view(-1, 28*28)
        answers += data[1].tolist()

        code, outputs = model(inputs)
        axis_x += code[:, 0].tolist()
        axis_y += code[:, 1].tolist()


plt.scatter(axis_x, axis_y, c=answers)
plt.colorbar()
plt.show()



Output:

Different colors represent different numbers

It can be seen that, in fact, the CODE compressed by AutoEncoder has been able to initially grasp the existence of different features in each different picture. This is really interesting.


References


Read More

Leave a Reply