Last Updated on 2021-05-12 by Clay
Today I want to record how to use Deep generative Adversarial Network (DCGAN) to implement a simple generate picture model. I wanted to demo with delicious snack pictures, but the effect was not very good, I downloaded half a million snack pictures in vain.
Finally, I used the official demo CelebA dataset.
The following code is somewhat different from the official code, but basically it is written with reference to the official model design. If you want to read the official teaching, maybe you can refer to: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
Introduction of CelebA dataset
Large-scale CelebFaces Attributes (CelebA), a collection of famous celebrity face pictures, and use Bounding Box to label faces, was established by the Multimedia Lab of the Hong Kong University.
The dataset has a total of 10,177 people, 202,599 face pictures, and each picture has a resolution of 178 x 218.
200,000 pictures is quite a lot. These are the training data we generate against the network.
Introdution of DCGAN
DCGAN is now generally regarded as an extension of GAN (Generative Adversarial Network), the full name is Deep Convolutional Generative Adversarial Network.
As the name suggests, the concept of CNN is added to the generative adversarial network. The architecture of this model training is basically divided into two models: Generator and Discriminator.
Generator is responsible for generating pictures from real pictures. All of these pictures will be labeled "fake" (usually 0). On the other hand, Discriminator ins a binary classifier that is responsible for judging real pictures and fake pictures.
The common method is to train one of the models first, then switch to another model when the model's Loss is low, and then switch back to the original model when the loss of the other model is low and continue training... In this way, the two models compete with each other.
Gradually, the images generated by Generator will start to make Discriminator difficult to distinguish between true and false, and we have achieved our goal-to produce a truly useful generated image model.
However, the above are all ideal conditions.
In fact, GAN is a poorly controlled model.
- If the discriminator is too strong, it will cause the generator's weight to be useless no matter how you adjust it;
- If the discriminator is too weak, the random generated pictures by the generator will be regarded as real pictures.
In both bases, the generator will produce some very bad pictures.
So overall, GAN is a very cost-intensive model. It is best to store a model at each stage, and always pay attention to whether Discriminator and Generator's Loss are improved. Perhaps supplemented by TensorBoard is a good idea.
Data preparation
The format of the dataset to be prepared is:
celeba
|__ img_align_celeba
Inside img_align_celeba are all our jpg files, and the folder outside img_align_celeba is the path we want to assign to the code DataLoader. Be careful not to make a mistake, it took me some time to figure out the path I made the mistake at first.
Download the data: here
Discriminator.py
# -*- coding: utf-8 -*- import torch.nn as nn # Discriminator class Discriminator(nn.Module): def __init__(self, inputSize, hiddenSize): super(Discriminator, self).__init__() self.main = nn.Sequential( nn.Conv2d(inputSize, hiddenSize, 4, 2, 1, bias=False), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(hiddenSize, hiddenSize*2, 4, 2, 1, bias=False), nn.BatchNorm2d(hiddenSize*2), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(hiddenSize*2, hiddenSize*4, 4, 2, 1, bias=False), nn.BatchNorm2d(hiddenSize*4), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(hiddenSize*4, hiddenSize*8, 4, 2, 1, bias=False), nn.BatchNorm2d(hiddenSize*8), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(hiddenSize*8, 1, 4, 1, 0, bias=False), nn.Sigmoid()) def forward(self, input): return self.main(input)
This can be regarded as a very classic CNN model, the Convolution layer plus the normalized BatchNorm, and then use the activation function LeakyReLU output.
And we print model:
Discriminator(
(main): Sequential(
(0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(10): LeakyReLU(negative_slope=0.2, inplace=True)
(11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False)
(12): Sigmoid()
)
)
Generator.py
# -*- coding: utf-8 -*- import torch.nn as nn # Generator class Generator(nn.Module): def __init__(self, inputSize, hiddenSize, outputSize): super(Generator, self).__init__() self.main = nn.Sequential( nn.ConvTranspose2d(inputSize, hiddenSize*8, 4, 1, 0, bias=False), nn.BatchNorm2d(hiddenSize*8), nn.ReLU(True), nn.ConvTranspose2d(hiddenSize*8, hiddenSize*4, 4, 2, 1, bias=False), nn.BatchNorm2d(hiddenSize*4), nn.ReLU(True), nn.ConvTranspose2d(hiddenSize*4, hiddenSize*2, 4, 2, 1, bias=False), nn.BatchNorm2d(hiddenSize*2), nn.ReLU(True), nn.ConvTranspose2d(hiddenSize*2, hiddenSize, 4, 2, 1, bias=False), nn.BatchNorm2d(hiddenSize), nn.ReLU(True), nn.ConvTranspose2d(hiddenSize, outputSize, 4, 2, 1, bias=False), nn.Tanh()) def forward(self, input): return self.main(input)
There is a relatively rare model layer on the Generator side: ConvTranspose2d(). Basically, it is often translated as "transposed convolution" and "deconvolution". Since the Generator accepts some randomly sampled noise as input and hopes to generate a picture, it is necessary to use this model layer plus backward propagation to adjust the weight to make those noises truly form a common picture.
Train.py
After the above two models are defined, we should come to the training part.
# -*- coding: utf-8 -*- import random import torch.nn as nn import torch.optim as optim import torch.utils.data import torchvision.datasets as dset import torchvision.transforms as transforms import torchvision.utils as vutils import numpy as np import matplotlib.pyplot as plt from DCGAN.generator import Generator from DCGAN.discriminator import Discriminator
Import the packages we need. DCGAN.generator and DCGAN.discriminator are the locations where my two models are written. As long as you can import them normally, you can set them as you like.
# CUDA device = 'cuda:0' if torch.cuda.is_available() else 'cpu' print('GPU State:', device) # Random seed manualSeed = 7777 print('Random Seed:', manualSeed) random.seed(manualSeed) torch.manual_seed(manualSeed)
Confirm whether the GPU is available.
In addition, fix the Seed, whether it is Numpy or Torch's own Seed.
# Attributes dataroot = 'celeba' batch_size = 1024 image_size = 64 G_out_D_in = 3 G_in = 100 G_hidden = 64 D_hidden = 64 epochs = 5 lr = 0.001 beta1 = 0.5
Parameter setting, you can adjust them.
# Data dataset = dset.ImageFolder(root=dataroot, transform=transforms.Compose([ transforms.Resize(image_size), transforms.CenterCrop(image_size), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ])) # Create the dataLoader dataLoader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
As mentioned above, please note that the root path is a folder above the folder of our picture.
# Weights def weights_init(m): classname = m.__class__.__name__ print('classname:', classname) if classname.find('Conv') != -1: nn.init.normal_(m.weight.data, 0.0, 0.02) elif classname.find('BatchNorm') != -1: nn.init.normal_(m.weight.data, 1.0, 0.02) nn.init.constant_(m.bias.data, 0)
Initialize the weights! This is actually handled in accordance with the requirements of the original DCGAN paper. You can also remove it and try it out to see the effect.
# Train def train(): # Create the generator netG = Generator(G_in, G_hidden, G_out_D_in).to(device) netG.apply(weights_init) print(netG) # Create the discriminator netD = Discriminator(G_out_D_in, D_hidden).to(device) netD.apply(weights_init) print(netD) # Loss fuG_out_D_intion criterion = nn.BCELoss() fixed_noise = torch.randn(64, G_in, 1, 1, device=device) real_label = 1 fake_label = 0 optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999)) optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999)) img_list = [] G_losses = [] D_losses = [] iters = 0 print('Start!') for epoch in range(epochs): for i, data in enumerate(dataLoader, 0): # Update D network netD.zero_grad() real_cpu = data[0].to(device) b_size = real_cpu.size(0) label = torch.full((b_size,), real_label, device=device) output = netD(real_cpu).view(-1) errD_real = criterion(output, label) errD_real.backward() D_x = output.mean().item() noise = torch.randn(b_size, G_in, 1, 1, device=device) fake = netG(noise) label.fill_(fake_label) output = netD(fake.detach()).view(-1) errD_fake = criterion(output, label) errD_fake.backward() D_G_z1 = output.mean().item() errD = errD_real + errD_fake optimizerD.step() # Update G network netG.zero_grad() label.fill_(real_label) output = netD(fake).view(-1) errG = criterion(output, label) errG.backward() D_G_z2 = output.mean().item() optimizerG.step() # Output training stats if i % 50 == 0: print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f' % (epoch, epochs, i, len(dataLoader), errD.item(), errG.item(), D_x, D_G_z1, D_G_z2)) # Save Losses for plotting later G_losses.append(errG.item()) D_losses.append(errD.item()) # Check how the generator is doing by saving G's output on fixed_noise if (iters % 500 == 0) or ((epoch == epochs - 1) and (i == len(dataLoader) - 1)): with torch.no_grad(): fake = netG(fixed_noise).detach().cpu() img_list.append(vutils.make_grid(fake, padding=2, normalize=True)) iters += 1 torch.save(netD, 'netD.pkl') torch.save(netG, 'netG.pkl') return G_losses, D_losses
In real Training, we can see that we train Discriminator and Generator together in order to save trouble. This is the method in PyTorch Tutorial.
Finally, let's draw the Loss of our two models to observe the situation and print a comparison of true and false pictures:
# Plot def plotImage(G_losses, D_losses): print('Start to plot!!') plt.figure(figsize=(10, 5)) plt.title("Generator and Discriminator Loss During Training") plt.plot(G_losses, label="G") plt.plot(D_losses, label="D") plt.xlabel("iterations") plt.ylabel("Loss") plt.legend() plt.show() # Grab a batch of real images from the dataloader real_batch = next(iter(dataLoader)) # Plot the real images plt.figure(figsize=(15, 15)) plt.subplot(1, 2, 1) plt.axis("off") plt.title("Real Images") plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(), (1, 2, 0))) # Plot the fake images from the last epoch plt.subplot(1, 2, 2) plt.axis("off") plt.title("Fake Images") plt.imshow(np.transpose(img_list[-1], (1, 2, 0))) plt.show()
Output:
Perhaps from the small picture, the Fake picture looks like a human face-but if you zoom in, you will find that the degree of completion is far from enough.
I hope that there will be a chance to study how to improve this model in the future. In any case, the image-related things are quite interesting!
Read More
- [PyTorch] Tutorial(1) What is Tensor?
- [PyTorch] Tutorial(2) Automatic derivative
- [PyTorch] Tutorial(3) Introduction of Neural Networks
- [PyTorch] Tutorial(4) Train a model to classify MNIST dataset
- [PyTorch] Tutorial(5) How to train a model to classify CIFAR-10 database
- [PyTorch] Tutorial(6) Audio of Processing Module: torchaudio
- [PyTorch] Tutorial(7) Use Deep Generative Adversarial Network (DCGAN) to generate pictures