Last Updated on 2021-05-19 by Clay
生成對抗網路 (generative adversarial network, GAN) 是一個相當有名的神經網路模型,其功用為我們輸入一組『噪音』(Noise),然後通過 Generator 產生出一組數字圖片,再經由 Discriminator 分辨是否是真正的圖片。
先訓練出簡單的 Discriminator 分類器、然後訓練出 Generator 生成器,兩個模型反覆對抗、訓練,最後取訓練好的 Generator 模型,我們就可以隨機產生噪音,任意產生圖片啦!
GAN 的原理其實就是這麼單純。
MNIST
MNIST 則是一套非常著名的手寫數字資料集。已經知道什麼是 Mnist dataset 的人可以直接跳過這一小節。
它的地位可說是 Machine Learning 界的 Hello World 也不為過。
在 MNIST 這個資料集中,包含著 60000 張 Training data 的圖片、以及 10000 張 Test data 的圖片。聽說這總共 70000 張圖片是來自於高中學生以及人口普查的工作人員,每張的像素皆為 28 x 28,每個像素點都以一個灰階值來表示。
這個資料集珍貴的地方在於已經有標註 Label 了。分別是使用 one hot encoding 來標注 0 到 9。
想要了解更多的人,可以前往以下網站看看:http://yann.lecun.com/exdb/mnist/
模型定義
以下的程式碼我是使用 PyTorch 這個框架來寫的,若是對 PyTorch 有興趣,可以觀看我之前寫的《[PyTorch 教學] Getting Start: 從 Tensor 設定開始》或是跟 GAN 模型更相關的《[PyTorch 教學] Image: DCGAN —— 利用生成對抗網路生成圖片》。
以下我逐行講解程式碼,最後程式碼會全部放在後頭。
首先,我模型是另外定義在 model.py 這個檔案裡頭。
# -*- coding: utf-8 -*- import torch.nn as nn class discriminator(nn.Module): def __init__(self): super(discriminator, self).__init__() self.main = nn.Sequential( nn.Linear(784, 256), nn.LeakyReLU(0.2), nn.Linear(256, 256), nn.LeakyReLU(0.2), nn.Linear(256, 1), nn.Sigmoid() ) def forward(self, input): return self.main(input) class generator(nn.Module): def __init__(self): super(generator, self).__init__() self.main = nn.Sequential( nn.Linear(128, 1024), nn.ReLU(), nn.Linear(1024, 1024), nn.ReLU(), nn.Linear(1024, 784), nn.Tanh() ) def forward(self, input): return self.main(input)
Discriminator 的部份,由於是要做 Mnist 圖片的分類器 (Classifier),故最開始的 Tensor 輸入尺寸便是 784 (Mnist 的圖片為 28x28),然後一路縮小 Fully Connected 的神經元數目,最後只剩一個神經元,使用 Sigmoid 激活函數輸出,來判斷是 Real 或 Fake 的圖片。(Sigmoid 函數可以參考我之前寫過的《Machine Learning 補充筆記: Sigmoid function》)
Generator 則是輸入 Noise (噪音),然後生成 28x28 (784) 的圖片,一樣,我這裡只使用了 Fully Connected。
需要事先聲明,我想這不一定是最好的模型配置,大家可以參考看看,並試試不同的配置,這還滿有趣的。
Training
首先, Import 所有我們要使用到的 Packages。
# -*- coding: utf-8 -*- import time import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import transforms from model import discriminator, generator import numpy as np import matplotlib.pyplot as plt
這裡首先設定了開始時間(如果你不需要計時也可以拿掉),然後設置了繪圖相關的 function。
start_time = time.time() plt.rcParams['image.cmap'] = 'gray' def show_images(images): sqrtn = int(np.ceil(np.sqrt(images.shape[0]))) for index, image in enumerate(images): plt.subplot(sqrtn, sqrtn, index+1) plt.imshow(image.reshape(28, 28))
這裡分別定義了 Discriminator 及 Generator 的 Loss Function。 Discriminator 的 Loss 為『模型預測結果』與『實際答案』之間的距離、Generator 的 Loss 基本上為『拿到 True 分類的數量』——也就是騙過我們訓練的 Discriminator 越多次越好。
# Discriminator Loss => BCELoss def d_loss_function(inputs, targets): return nn.BCELoss()(inputs, targets) def g_loss_function(inputs): targets = torch.ones([inputs.shape[0], 1]) targets = targets.to(device) return nn.BCELoss()(inputs, targets)
接下來設定 Learning 所需要的一些參數、並將 Training Data 讀取進來 —— 這主要是用在 Discriminator 的訓練上,Generator 的訓練只需要隨機產生 Noise 即可。
# GPU device = 'cuda:0' if torch.cuda.is_available() else 'cpu' print('GPU State:', device) # Model G = generator().to(device) D = discriminator().to(device) print(G) print(D) # Settings epochs = 200 lr = 0.0002 batch_size = 64 g_optimizer = optim.Adam(G.parameters(), lr=lr, betas=(0.5, 0.999)) d_optimizer = optim.Adam(D.parameters(), lr=lr, betas=(0.5, 0.999)) # Transform transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) # Load data train_set = datasets.MNIST('mnist/', train=True, download=True, transform=transform) test_set = datasets.MNIST('mnist/', train=False, download=True, transform=transform) train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)
這邊就是 Training 的程式碼了。
# Train for epoch in range(epochs): epoch += 1 for times, data in enumerate(train_loader): times += 1 real_inputs = data[0].to(device) test = 255 * (0.5 * real_inputs[0] + 0.5) real_inputs = real_inputs.view(-1, 784) real_outputs = D(real_inputs) real_label = torch.ones(real_inputs.shape[0], 1).to(device) noise = (torch.rand(real_inputs.shape[0], 128) - 0.5) / 0.5 noise = noise.to(device) fake_inputs = G(noise) fake_outputs = D(fake_inputs) fake_label = torch.zeros(fake_inputs.shape[0], 1).to(device) outputs = torch.cat((real_outputs, fake_outputs), 0) targets = torch.cat((real_label, fake_label), 0) # Zero the parameter gradients d_optimizer.zero_grad() # Backward propagation d_loss = d_loss_function(outputs, targets) d_loss.backward() d_optimizer.step() # Generator noise = (torch.rand(real_inputs.shape[0], 128)-0.5)/0.5 noise = noise.to(device) fake_inputs = G(noise) fake_outputs = D(fake_inputs) g_loss = g_loss_function(fake_outputs) g_optimizer.zero_grad() g_loss.backward() g_optimizer.step() if times % 100 == 0 or times == len(train_loader): print('[{}/{}, {}/{}] D_loss: {:.3f} G_loss: {:.3f}'.format(epoch, epochs, times, len(train_loader), d_loss.item(), g_loss.item())) imgs_numpy = (fake_inputs.data.cpu().numpy()+1.0)/2.0 show_images(imgs_numpy[:16]) plt.show() if epoch % 50 == 0: torch.save(G, 'Generator_epoch_{}.pth'.format(epoch)) print('Model saved.') print('Training Finished.') print('Cost Time: {}s'.format(time.time()-start_time))
Output:
沒有到完美,大家可以試試看不同的配置。
完整程式碼
model.py
# -*- coding: utf-8 -*- import torch.nn as nn class discriminator(nn.Module): def __init__(self): super(discriminator, self).__init__() self.main = nn.Sequential( nn.Linear(784, 256), nn.LeakyReLU(0.2), nn.Linear(256, 256), nn.LeakyReLU(0.2), nn.Linear(256, 1), nn.Sigmoid() ) def forward(self, input): return self.main(input) class generator(nn.Module): def __init__(self): super(generator, self).__init__() self.main = nn.Sequential( nn.Linear(128, 1024), nn.ReLU(), nn.Linear(1024, 1024), nn.ReLU(), nn.Linear(1024, 784), nn.Tanh() ) def forward(self, input): return self.main(input)
mnist_train.py
# -*- coding: utf-8 -*- import time import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torchvision import datasets from torchvision.transforms import transforms from model import discriminator, generator import numpy as np import matplotlib.pyplot as plt start_time = time.time() plt.rcParams['image.cmap'] = 'gray' def show_images(images): sqrtn = int(np.ceil(np.sqrt(images.shape[0]))) for index, image in enumerate(images): plt.subplot(sqrtn, sqrtn, index+1) plt.imshow(image.reshape(28, 28)) # Discriminator Loss => BCELoss def d_loss_function(inputs, targets): return nn.BCELoss()(inputs, targets) def g_loss_function(inputs): targets = torch.ones([inputs.shape[0], 1]) targets = targets.to(device) return nn.BCELoss()(inputs, targets) # GPU device = 'cuda:0' if torch.cuda.is_available() else 'cpu' print('GPU State:', device) # Model G = generator().to(device) D = discriminator().to(device) print(G) print(D) # Settings epochs = 200 lr = 0.0002 batch_size = 64 g_optimizer = optim.Adam(G.parameters(), lr=lr, betas=(0.5, 0.999)) d_optimizer = optim.Adam(D.parameters(), lr=lr, betas=(0.5, 0.999)) # Transform transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) # Load data train_set = datasets.MNIST('mnist/', train=True, download=True, transform=transform) test_set = datasets.MNIST('mnist/', train=False, download=True, transform=transform) train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True) test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False) # Train for epoch in range(epochs): epoch += 1 for times, data in enumerate(train_loader): times += 1 real_inputs = data[0].to(device) test = 255 * (0.5 * real_inputs[0] + 0.5) real_inputs = real_inputs.view(-1, 784) real_outputs = D(real_inputs) real_label = torch.ones(real_inputs.shape[0], 1).to(device) noise = (torch.rand(real_inputs.shape[0], 128) - 0.5) / 0.5 noise = noise.to(device) fake_inputs = G(noise) fake_outputs = D(fake_inputs) fake_label = torch.zeros(fake_inputs.shape[0], 1).to(device) outputs = torch.cat((real_outputs, fake_outputs), 0) targets = torch.cat((real_label, fake_label), 0) # Zero the parameter gradients d_optimizer.zero_grad() # Backward propagation d_loss = d_loss_function(outputs, targets) d_loss.backward() d_optimizer.step() # Generator noise = (torch.rand(real_inputs.shape[0], 128)-0.5)/0.5 noise = noise.to(device) fake_inputs = G(noise) fake_outputs = D(fake_inputs) g_loss = g_loss_function(fake_outputs) g_optimizer.zero_grad() g_loss.backward() g_optimizer.step() if times % 100 == 0 or times == len(train_loader): print('[{}/{}, {}/{}] D_loss: {:.3f} G_loss: {:.3f}'.format(epoch, epochs, times, len(train_loader), d_loss.item(), g_loss.item())) imgs_numpy = (fake_inputs.data.cpu().numpy()+1.0)/2.0 show_images(imgs_numpy[:16]) plt.show() if epoch % 50 == 0: torch.save(G, 'Generator_epoch_{}.pth'.format(epoch)) print('Model saved.') print('Training Finished.') print('Cost Time: {}s'.format(time.time()-start_time))
test.py
import torch from torchvision import transforms import numpy as np import matplotlib.pyplot as plt import matplotlib.gridspec as gridspec plt.rcParams['figure.figsize'] = (10.0, 8.0) plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray' def show_images(images): sqrtn = int(np.ceil(np.sqrt(images.shape[0]))) for index, image in enumerate(images): plt.subplot(sqrtn, sqrtn, index+1) plt.imshow(image.reshape(28, 28)) device = 'cuda:0' if torch.cuda.is_available() else 'cpu' print('GPU State:', device) # Model G = torch.load('Generator_epoch_200.pth') G.eval() # Generator noise = (torch.rand(16, 128)-0.5) / 0.5 noise = noise.to(device) fake_image = G(noise) imgs_numpy = (fake_image.data.cpu().numpy()+1.0)/2.0 show_images(imgs_numpy) plt.show()
如果覺得網頁上看起來不方便,也可以到我的 Github 上看看:https://github.com/ccs96307/PyTorch-Mnist-GAN