Skip to content

[PyTorch] Tutorial(7) Use Deep Generative Adversarial Network (DCGAN) to generate pictures

Last Updated on 2021-05-12 by Clay

Today I want to record how to use Deep generative Adversarial Network (DCGAN) to implement a simple generate picture model. I wanted to demo with delicious snack pictures, but the effect was not very good, I downloaded half a million snack pictures in vain.

Finally, I used the official demo CelebA dataset.

The following code is somewhat different from the official code, but basically it is written with reference to the official model design. If you want to read the official teaching, maybe you can refer to: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html


Introduction of CelebA dataset

Large-scale CelebFaces Attributes (CelebA), a collection of famous celebrity face pictures, and use Bounding Box to label faces, was established by the Multimedia Lab of the Hong Kong University.

The dataset has a total of 10,177 people, 202,599 face pictures, and each picture has a resolution of 178 x 218.

200,000 pictures is quite a lot. These are the training data we generate against the network.


Introdution of DCGAN

DCGAN is now generally regarded as an extension of GAN (Generative Adversarial Network), the full name is Deep Convolutional Generative Adversarial Network.

As the name suggests, the concept of CNN is added to the generative adversarial network. The architecture of this model training is basically divided into two models: Generator and Discriminator.

Generator is responsible for generating pictures from real pictures. All of these pictures will be labeled “fake” (usually 0). On the other hand, Discriminator ins a binary classifier that is responsible for judging real pictures and fake pictures.

The common method is to train one of the models first, then switch to another model when the model’s Loss is low, and then switch back to the original model when the loss of the other model is low and continue training… In this way, the two models compete with each other.

Gradually, the images generated by Generator will start to make Discriminator difficult to distinguish between true and false, and we have achieved our goal-to produce a truly useful generated image model.

However, the above are all ideal conditions.

In fact, GAN is a poorly controlled model.

  • If the discriminator is too strong, it will cause the generator’s weight to be useless no matter how you adjust it;
  • If the discriminator is too weak, the random generated pictures by the generator will be regarded as real pictures.

In both bases, the generator will produce some very bad pictures.

So overall, GAN is a very cost-intensive model. It is best to store a model at each stage, and always pay attention to whether Discriminator and Generator’s Loss are improved. Perhaps supplemented by TensorBoard is a good idea.


Data preparation

The format of the dataset to be prepared is:

celeba
|__ img_align_celeba

Inside img_align_celeba are all our jpg files, and the folder outside img_align_celeba is the path we want to assign to the code DataLoader. Be careful not to make a mistake, it took me some time to figure out the path I made the mistake at first.

Download the data: here


Discriminator.py

# -*- coding: utf-8 -*-
import torch.nn as nn


# Discriminator
class Discriminator(nn.Module):
    def __init__(self, inputSize, hiddenSize):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(inputSize, hiddenSize, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(hiddenSize, hiddenSize*2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hiddenSize*2),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(hiddenSize*2, hiddenSize*4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hiddenSize*4),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(hiddenSize*4, hiddenSize*8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hiddenSize*8),
            nn.LeakyReLU(0.2, inplace=True),

            nn.Conv2d(hiddenSize*8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid())

    def forward(self, input):
        return self.main(input)


This can be regarded as a very classic CNN model, the Convolution layer plus the normalized BatchNorm, and then use the activation function LeakyReLU output.

And we print model:

Discriminator(
  (main): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2, inplace=True)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2, inplace=True)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2, inplace=True)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (12): Sigmoid()
  )
)

Generator.py

# -*- coding: utf-8 -*-
import torch.nn as nn


# Generator
class Generator(nn.Module):
    def __init__(self, inputSize, hiddenSize, outputSize):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(inputSize, hiddenSize*8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(hiddenSize*8),
            nn.ReLU(True),

            nn.ConvTranspose2d(hiddenSize*8, hiddenSize*4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hiddenSize*4),
            nn.ReLU(True),

            nn.ConvTranspose2d(hiddenSize*4, hiddenSize*2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hiddenSize*2),
            nn.ReLU(True),

            nn.ConvTranspose2d(hiddenSize*2, hiddenSize, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hiddenSize),
            nn.ReLU(True),

            nn.ConvTranspose2d(hiddenSize, outputSize, 4, 2, 1, bias=False),
            nn.Tanh())

    def forward(self, input):
        return self.main(input)



There is a relatively rare model layer on the Generator side: ConvTranspose2d(). Basically, it is often translated as “transposed convolution” and “deconvolution”. Since the Generator accepts some randomly sampled noise as input and hopes to generate a picture, it is necessary to use this model layer plus backward propagation to adjust the weight to make those noises truly form a common picture.


Train.py

After the above two models are defined, we should come to the training part.

# -*- coding: utf-8 -*-
import random
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
from DCGAN.generator import Generator
from DCGAN.discriminator import Discriminator



Import the packages we need. DCGAN.generator and DCGAN.discriminator are the locations where my two models are written. As long as you can import them normally, you can set them as you like.

# CUDA
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print('GPU State:', device)


# Random seed
manualSeed = 7777
print('Random Seed:', manualSeed)
random.seed(manualSeed)
torch.manual_seed(manualSeed)



Confirm whether the GPU is available.

In addition, fix the Seed, whether it is Numpy or Torch’s own Seed.

# Attributes
dataroot = 'celeba'

batch_size = 1024
image_size = 64
G_out_D_in = 3
G_in = 100
G_hidden = 64
D_hidden = 64

epochs = 5
lr = 0.001
beta1 = 0.5



Parameter setting, you can adjust them.

# Data
dataset = dset.ImageFolder(root=dataroot,
                           transform=transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                           ]))

# Create the dataLoader
dataLoader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)



As mentioned above, please note that the root path is a folder above the folder of our picture.

# Weights
def weights_init(m):
    classname = m.__class__.__name__
    print('classname:', classname)

    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)



Initialize the weights! This is actually handled in accordance with the requirements of the original DCGAN paper. You can also remove it and try it out to see the effect.

# Train
def train():
    # Create the generator
    netG = Generator(G_in, G_hidden, G_out_D_in).to(device)
    netG.apply(weights_init)
    print(netG)

    # Create the discriminator
    netD = Discriminator(G_out_D_in, D_hidden).to(device)
    netD.apply(weights_init)
    print(netD)

    # Loss fuG_out_D_intion
    criterion = nn.BCELoss()
    fixed_noise = torch.randn(64, G_in, 1, 1, device=device)

    real_label = 1
    fake_label = 0
    optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
    optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

    img_list = []
    G_losses = []
    D_losses = []
    iters = 0
    print('Start!')

    for epoch in range(epochs):
        for i, data in enumerate(dataLoader, 0):
            # Update D network
            netD.zero_grad()
            real_cpu = data[0].to(device)
            b_size = real_cpu.size(0)
            label = torch.full((b_size,), real_label, device=device)
            output = netD(real_cpu).view(-1)

            errD_real = criterion(output, label)
            errD_real.backward()
            D_x = output.mean().item()

            noise = torch.randn(b_size, G_in, 1, 1, device=device)
            fake = netG(noise)
            label.fill_(fake_label)
            output = netD(fake.detach()).view(-1)

            errD_fake = criterion(output, label)
            errD_fake.backward()

            D_G_z1 = output.mean().item()
            errD = errD_real + errD_fake
            optimizerD.step()

            # Update G network
            netG.zero_grad()
            label.fill_(real_label)
            output = netD(fake).view(-1)
            errG = criterion(output, label)
            errG.backward()
            D_G_z2 = output.mean().item()
            optimizerG.step()

            # Output training stats
            if i % 50 == 0:
                print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f' % (epoch, epochs, i, len(dataLoader), errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

            # Save Losses for plotting later
            G_losses.append(errG.item())
            D_losses.append(errD.item())

            # Check how the generator is doing by saving G's output on fixed_noise
            if (iters % 500 == 0) or ((epoch == epochs - 1) and (i == len(dataLoader) - 1)):
                with torch.no_grad():
                    fake = netG(fixed_noise).detach().cpu()

                img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

            iters += 1

    torch.save(netD, 'netD.pkl')
    torch.save(netG, 'netG.pkl')

    return G_losses, D_losses



In real Training, we can see that we train Discriminator and Generator together in order to save trouble. This is the method in PyTorch Tutorial.

Finally, let’s draw the Loss of our two models to observe the situation and print a comparison of true and false pictures:

# Plot
def plotImage(G_losses, D_losses):
    print('Start to plot!!')
    plt.figure(figsize=(10, 5))
    plt.title("Generator and Discriminator Loss During Training")
    plt.plot(G_losses, label="G")
    plt.plot(D_losses, label="D")
    plt.xlabel("iterations")
    plt.ylabel("Loss")
    plt.legend()
    plt.show()

    # Grab a batch of real images from the dataloader
    real_batch = next(iter(dataLoader))

    # Plot the real images
    plt.figure(figsize=(15, 15))
    plt.subplot(1, 2, 1)
    plt.axis("off")
    plt.title("Real Images")
    plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(), (1, 2, 0)))

    # Plot the fake images from the last epoch
    plt.subplot(1, 2, 2)
    plt.axis("off")
    plt.title("Fake Images")
    plt.imshow(np.transpose(img_list[-1], (1, 2, 0)))
    plt.show()



Output:

Perhaps from the small picture, the Fake picture looks like a human face-but if you zoom in, you will find that the degree of completion is far from enough.

I hope that there will be a chance to study how to improve this model in the future. In any case, the image-related things are quite interesting!


Read More

Leave a Reply