[PyTorch] How To Use pad_packed_sequence() And pack_padded_sequence() To Adjust Sequence Length

When we use RNN network (such as LSTM and GRU), we can use Embedding layer provided from PyTorch, and receive many different length sequence sentence input.

Many people recommend me to use pack_padded_sequence and pad_packed_sequence to adjust different length sequence sentence.

So I plan to record how to use them.

In additional, I demo with pad() function in PyTorch for padding my sentence to a fixed length, and use torch.cat() to concatenate different sequences.

Sample Code

Simply put, pack_padded_sequence() can compress sequence, pad_packed_sequence() can decompress the sequence to the original sequence.

The following is a simple example.

# coding: utf-8
import torch
import torch.nn.functional as F
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence


# Sequences
a = torch.tensor([1, 2])
b = torch.tensor([3, 4, 5])
c = torch.tensor([6, 7, 8, 9])
print('a:', a)
print('b:', b)
print('c:', c)

# Settings
seq_lens = [len(a), len(b), len(c)]
max_len = max(seq_lens)


# Zero padding
a = F.pad(a, (0, max_len-len(a)))
b = F.pad(b, (0, max_len-len(b)))
c = F.pad(c, (0, max_len-len(c)))


# Merge the sequences
seq = torch.cat((a, b, c), 0).view(-1, max_len)
print('Sequence:', seq)


# Pack
packed_seq = pack_padded_sequence(seq, seq_lens, batch_first=True, enforce_sorted=False)
print('Pack:', packed_seq)


# Unpack
unpacked_seq, unpacked_lens = pad_packed_sequence(packed_seq, batch_first=True)
print('Unpack:', unpacked_seq)
print('length:', unpacked_lens)


# Reduction
a = unpacked_seq[0][:unpacked_lens[0]]
b = unpacked_seq[1][:unpacked_lens[1]]
c = unpacked_seq[2][:unpacked_lens[2]]
print('Recutions:')
print('a:', a)
print('b:', b)
print('c:', c)

# coding: utf-8
import torch
import torch.nn.functional as F
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence


# Sequences
a = torch.tensor([1, 2])
b = torch.tensor([3, 4, 5])
c = torch.tensor([6, 7, 8, 9])
print('a:', a)
print('b:', b)
print('c:', c)

# Settings
seq_lens = [len(a), len(b), len(c)]
max_len = max(seq_lens)


# Zero padding
a = F.pad(a, (0, max_len-len(a)))
b = F.pad(b, (0, max_len-len(b)))
c = F.pad(c, (0, max_len-len(c)))


# Merge the sequences
seq = torch.cat((a, b, c), 0).view(-1, max_len)
print('Sequence:', seq)


# Pack
packed_seq = pack_padded_sequence(seq, seq_lens, batch_first=True, enforce_sorted=False)
print('Pack:', packed_seq)


# Unpack
unpacked_seq, unpacked_lens = pad_packed_sequence(packed_seq, batch_first=True)
print('Unpack:', unpacked_seq)
print('length:', unpacked_lens)


# Reduction
a = unpacked_seq[0][:unpacked_lens[0]]
b = unpacked_seq[1][:unpacked_lens[1]]
c = unpacked_seq[2][:unpacked_lens[2]]
print('Recutions:')
print('a:', a)
print('b:', b)
print('c:', c)

Output:

a: tensor([1, 2])
b: tensor([3, 4, 5])
c: tensor([6, 7, 8, 9])

Sequence: 
tensor([[1, 2, 0, 0],
        [3, 4, 5, 0],
        [6, 7, 8, 9]])

Pack: 
PackedSequence(data=tensor([6, 3, 1, 7, 4, 2, 8, 5, 9]),  
               batch_sizes=tensor([3, 3, 2, 1]), 
               sorted_indices=tensor([2, 1, 0]), 
               unsorted_indices=tensor([2, 1, 0]))

Unpack: 
tensor([[1, 2, 0, 0],
        [3, 4, 5, 0],
        [6, 7, 8, 9]])

length: tensor([2, 3, 4])

Recutions:
a: tensor([1, 2])
b: tensor([3, 4, 5])
c: tensor([6, 7, 8, 9])

I have three different length sequences, I just need to do:

Record every length of sequence
Decide a fixed max length
Padding sequences to the fixed length
Use pack_padded_sequence() to compress sequences
Use pad_packed_sequence() to decompress sequences

As we can see, we can recovery a sequence to original sequence.

[PyTorch] How To Use pad_packed_sequence() And pack_padded_sequence() To Adjust Sequence Length

Sample Code

References

Read More

Leave a ReplyCancel reply

[PyTorch] How To Use pad_packed_sequence() And pack_padded_sequence() To Adjust Sequence Length

Sample Code

References

Read More

Share this:

Leave a ReplyCancel reply