[PyTorch] 提取模型權重或模型層的方法筆記

Last Updated on 2021-07-06 by Clay

使用 PyTorch 框架搭建一個模型是一件十分方便簡易的事情。但是除了單純地搭建模型、訓練模型之外，我們也可以透過 PyTorch 框架，將已經訓練好的模型輸出其神經網路的『權重』、或是只單單擷取出其中一層『模型層』。

這樣做的好處是什麼？我個人的理解是，我們除了可以將訓練好的模型讀取、並繼續往下訓練外，有時候我們甚至可以將原有的模型拆解，分散至不同任務項目繼續訓練，是十分有彈性的訓練方法。

我的敘述可能不見得清楚反映了我想表達的意思，我舉個我實際遇過的例子：今天我使用了 BERT 當作我模型的第一層，也就是我把別人團隊訓練好的 Pre-train 模型使用在我的模型當作進行 fine-tune (微調)，最終訓練出一個分類模型。

然而這個分類模型我不見得滿意，因為我希望能夠加入更多的特徵當作訓練資料。不過問題來了，我對於原先 fine-tune 後的 Embedding Layer 相當滿意，不希望去改動，那麼我這時候就能讀取原先的模型，並且只擷取其中的第一層，也就是 Embedding Layer。為什麼只取第一層呢？這是因為我後面的模型層都是為了分類而搭建的，在我接下來想要加入更多特徵的分類模型當中是不需要的。

也就是說，我本來的模型可能是長這樣：

但是我後面的分類模型層都不需要了。

所以我只需要提取第一層，然後就可以繼續我接下來的下一個訓練任務。

提取權重

首先，先從如何將模型的權重提取開始介紹。首先，我所定義的分類模型構造如下：

# coding: utf-8
import torch.nn as nn


# Settings
vector_size = 300


# GRU
class GRU(nn.Module):
    def __init__(self):
        super(GRU, self).__init__()
        self.gru = nn.GRU(
            input_size=vector_size,
            hidden_size=vector_size,
            num_layers=5,
            dropout=0.3,
            bidirectional=True,
            batch_first=True,
        )

        self.fc = nn.Linear(vector_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        out, hidden = self.gru(inputs, None)

        hidden = hidden[-1]
        outputs = self.fc(hidden.squeeze(0))
       
        return self.sigmoid(outputs)

# coding: utf-8
import torch.nn as nn


# Settings
vector_size = 300


# GRU
class GRU(nn.Module):
    def __init__(self):
        super(GRU, self).__init__()
        self.gru = nn.GRU(
            input_size=vector_size,
            hidden_size=vector_size,
            num_layers=5,
            dropout=0.3,
            bidirectional=True,
            batch_first=True,
        )

        self.fc = nn.Linear(vector_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        out, hidden = self.gru(inputs, None)

        hidden = hidden[-1]
        outputs = self.fc(hidden.squeeze(0))
       
        return self.sigmoid(outputs)

其構造非常地單純，基本上就只有 GRU 模型層 (5 個 Hidden Layer)、全連接層、Sigmoid() 激活函數三種而已。

通過這個模型架構，我已經訓練好一個分類器，並將其儲存叫做 “gru_model.pth”。那麼以下，就是我如何讀取這個訓練好的模型，並印出其權重。

# coding: utf-8
import torch
from GRU_300 import GRU


# Load pre-trained model
model_a = torch.load('./gru_model.pth').cpu()
model_a.eval()


# Display all model layer weights
for name, para in model_a.named_parameters():
    print('{}: {}'.format(name, para.shape))

# coding: utf-8
import torch
from GRU_300 import GRU


# Load pre-trained model
model_a = torch.load(‘./gru_model.pth’).cpu()
model_a.eval()


# Display all model layer weights
for name, para in model_a.named_parameters():
    print(‘{}: {}’.format(name, para.shape))

Output:

gru.weight_ih_l0: torch.Size([900, 300])
gru.weight_hh_l0: torch.Size([900, 300])
gru.bias_ih_l0: torch.Size([900])
gru.bias_hh_l0: torch.Size([900])
gru.weight_ih_l0_reverse: torch.Size([900, 300])
gru.weight_hh_l0_reverse: torch.Size([900, 300])
gru.bias_ih_l0_reverse: torch.Size([900])
gru.bias_hh_l0_reverse: torch.Size([900])
gru.weight_ih_l1: torch.Size([900, 600])
gru.weight_hh_l1: torch.Size([900, 300])
gru.bias_ih_l1: torch.Size([900])
gru.bias_hh_l1: torch.Size([900])
gru.weight_ih_l1_reverse: torch.Size([900, 600])
gru.weight_hh_l1_reverse: torch.Size([900, 300])
gru.bias_ih_l1_reverse: torch.Size([900])
gru.bias_hh_l1_reverse: torch.Size([900])
gru.weight_ih_l2: torch.Size([900, 600])
gru.weight_hh_l2: torch.Size([900, 300])
gru.bias_ih_l2: torch.Size([900])
gru.bias_hh_l2: torch.Size([900])
gru.weight_ih_l2_reverse: torch.Size([900, 600])
gru.weight_hh_l2_reverse: torch.Size([900, 300])
gru.bias_ih_l2_reverse: torch.Size([900])
gru.bias_hh_l2_reverse: torch.Size([900])
gru.weight_ih_l3: torch.Size([900, 600])
gru.weight_hh_l3: torch.Size([900, 300])
gru.bias_ih_l3: torch.Size([900])
gru.bias_hh_l3: torch.Size([900])
gru.weight_ih_l3_reverse: torch.Size([900, 600])
gru.weight_hh_l3_reverse: torch.Size([900, 300])
gru.bias_ih_l3_reverse: torch.Size([900])
gru.bias_hh_l3_reverse: torch.Size([900])
gru.weight_ih_l4: torch.Size([900, 600])
gru.weight_hh_l4: torch.Size([900, 300])
gru.bias_ih_l4: torch.Size([900])
gru.bias_hh_l4: torch.Size([900])
gru.weight_ih_l4_reverse: torch.Size([900, 600])
gru.weight_hh_l4_reverse: torch.Size([900, 300])
gru.bias_ih_l4_reverse: torch.Size([900])
gru.bias_hh_l4_reverse: torch.Size([900])
fc.weight: torch.Size([1, 300])
fc.bias: torch.Size([1])

通過呼叫 “named_parameters()” 這個函式，我們可以印出模型層的名稱以及其權重。這裡為了顯示方便，我只印出了權重的維度，詳細的權重數值各位可以直接印出來看看。

(註：GRU_300 為我定義模型的程式)

那麼，以上是印出模型的方法。接下來我實際跑一次如何讓全新的模型繼承 Pre-train 的權重。

首先使用與剛才相同的函式 “named_parameters()” 來取得權重，這一次我們要將權重保存成 dict() 資料型態。

# coding: utf-8
import torch
from GRU_300 import GRU


# Load pre-trained model
model_a = torch.load('./gru_model.pth').cpu()
model_a.eval()


# Display all model layer weights
weights = dict()
for name, para in model_a.named_parameters():
    weights[name] = para

# coding: utf-8
import torch
from GRU_300 import GRU


# Load pre-trained model
model_a = torch.load(‘./gru_model.pth’).cpu()
model_a.eval()


# Display all model layer weights
weights = dict()
for name, para in model_a.named_parameters():
    weights[name] = para

這裡我將模型層名稱、權重通通儲存進 weights 中。

# Build a new model
model_b = GRU().cpu()
model_b_weight = model_b.state_dict()
model_b_weight.update(weights)
model_b.load_state_dict(model_b_weight)
model_b.eval()

# Build a new model
model_b = GRU().cpu()
model_b_weight = model_b.state_dict()
model_b_weight.update(weights)
model_b.load_state_dict(model_b_weight)
model_b.eval()

首先，我先建立一個新的 GRU 模型，並使用 “state_dict()” 提取出權重的『形狀』，緊接著，將剛才從 Pre-train 模型中提取出的權重 weights，使用 update() 函式將 model_b_weight 更新。

現在 model_b_weight 就是新的模型可以接受權重了，故我們再使用 load_state_dict() 將權重讀取進新的模型中。這樣一來，兩個模型應該一模一樣了。以下，我們隨機產生一個測試的輸入，來檢查兩個模型的輸出是否一模一樣。

# Test
inputs = torch.ones([1, 31, 300])

outputs_a = model_a.gru(inputs)
outputs_b = model_b.gru(inputs)

print(outputs_a[0]==outputs_b[0])

# Test
inputs = torch.ones([1, 31, 300])

outputs_a = model_a.gru(inputs)
outputs_b = model_b.gru(inputs)

print(outputs_a[0]==outputs_b[0])

Output:

tensor([[[True, True, True, …, True, True, True],
[True, True, True, …, True, True, True],
[True, True, True, …, True, True, True],
…,
[True, True, True, …, True, True, True],
[True, True, True, …, True, True, True],
[True, True, True, …, True, True, True]]])

可以瞧見，兩個模型的輸出是真的一模一樣的。這樣一來，我們新的模型就確實提取了 Pre-train 模型的權重了。

值得一提的是，我隨機產生的那個輸入的形狀是因應我真實訓練模型的資料形狀，不用太糾結那個 31 是怎麼跑出來的，哈哈。

使用擷取模型層輸出

擷取特定模型層輸出更是簡單，其實剛才的程式碼之中已經有包含著了。首先，再次聲明，以下是我的模型架構：

# coding: utf-8
import torch.nn as nn


# Settings
vector_size = 300


# GRU
class GRU(nn.Module):
    def __init__(self):
        super(GRU, self).__init__()
        self.gru = nn.GRU(
            input_size=vector_size,
            hidden_size=vector_size,
            num_layers=5,
            dropout=0.3,
            bidirectional=True,
            batch_first=True,
        )

        self.fc = nn.Linear(vector_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        out, hidden = self.gru(inputs, None)

        hidden = hidden[-1]
        outputs = self.fc(hidden.squeeze(0))
       
        return self.sigmoid(outputs)

# coding: utf-8
import torch.nn as nn


# Settings
vector_size = 300


# GRU
class GRU(nn.Module):
    def __init__(self):
        super(GRU, self).__init__()
        self.gru = nn.GRU(
            input_size=vector_size,
            hidden_size=vector_size,
            num_layers=5,
            dropout=0.3,
            bidirectional=True,
            batch_first=True,
        )

        self.fc = nn.Linear(vector_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        out, hidden = self.gru(inputs, None)

        hidden = hidden[-1]
        outputs = self.fc(hidden.squeeze(0))
       
        return self.sigmoid(outputs)

就像上方所說，我們模型架構一共有 GRU、全連接層、Sigmoid 等三個部份，那麼，若我只想要提取『全連接層』的部份呢？

# coding: utf-8
import torch
from GRU_300 import GRU


# Load pre-trained model
model = torch.load('./gru_model.pth').cpu()
model.eval()


# Inputs
inputs = torch.ones([1, 300])
outputs = model.fc(inputs)

print('Inputs:', inputs.shape)
print('Outputs:', outputs.shape)

# coding: utf-8
import torch
from GRU_300 import GRU


# Load pre-trained model
model = torch.load(‘./gru_model.pth’).cpu()
model.eval()


# Inputs
inputs = torch.ones([1, 300])
outputs = model.fc(inputs)

print(‘Inputs:’, inputs.shape)
print(‘Outputs:’, outputs.shape)

Output:

Inputs: torch.Size([1, 300])
Outputs: torch.Size([1, 1])

沒錯，在我們模型架構的定義中，全連接層的名稱為 “fc”。

所以我們直接使用模型的 “fc” 函式，就可以直接取用這一模型層了。實際上我們也看到了，300 維的輸入再經過了全連接層後，變成了僅有 1 維的輸出，完全符合我們模型本來的設計。

那麼，以上就是在 PyTorch 中，提取『權重』或是『模型層』的簡單筆記。

[PyTorch] 提取模型權重或模型層的方法筆記

提取權重

使用擷取模型層輸出

References

Read More

相關

Leave a Reply取消回覆

[PyTorch] 提取模型權重或模型層的方法筆記

提取權重

使用擷取模型層輸出

References

Read More

分享此文：

相關

Leave a Reply取消回覆