Last Updated on 2024-08-21 by Clay
Problem Description
When building deep learning models in PyTorch, adjusting the shapes of layers and input/output dimensions is something every AI engineer has to deal with. However, there is a small but interesting pitfall in the view()
method of PyTorch:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Intuitively, PyTorch requires the elements of a tensor to be stored contiguously in memory when using .view()
to change its shape. However, after certain operations such as .transpose()
and .permute()
, the memory layout of the tensor might no longer be contiguous.
Solution
Therefore, after applying methods like .transpose()
or .permute()
, if you need to use view()
to reshape the tensor, you must first call .contiguous()
to ensure the tensor is stored contiguously in memory.
Below is an example that reproduces the error. Suppose after calculating the multi-head attention mechanism, we want to merge the heads back into the original hidden_size
dimension.
import torch
batch_size = 16
seq_length = 512
num_head = 2
hidden_size = 768
inputs = torch.rand(batch_size, num_head, seq_length, int(hidden_size / num_head))
print("Shape:", inputs.shape)
inputs = inputs.permute(0, 2, 1, 3)
print("Permute Shape:", inputs.shape)
inputs = inputs.view(batch_size, seq_length, hidden_size)
print("Merge multi-head Shape:", inputs.shape)
Output:
Shape: torch.Size([16, 2, 512, 384]) Permute Shape: torch.Size([16, 512, 2, 384]) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /home/clay/Projects/machine_learning/transformers_from_scratch/analysis.ipynb Cell 12 line 1 11 inputs = inputs.permute(0, 2, 1, 3) 12 print("Permute Shape:", inputs.shape) ---> 14 inputs = inputs.view(batch_size, seq_length, hidden_size) 15 print("Merge multi-head Shape:", inputs.shape) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
This issue arises because the tensor is not contiguous in memory. Once we apply .contiguous()
, the reshaping proceeds smoothly.
import torch
batch_size = 16
seq_length = 512
num_head = 2
hidden_size = 768
inputs = torch.rand(batch_size, num_head, seq_length, int(hidden_size / num_head))
print("Shape:", inputs.shape)
# Wrong
# inputs = inputs.permute(0, 2, 1, 3)
# Correct
inputs = inputs.permute(0, 2, 1, 3).contiguous()
print("Permute Shape:", inputs.shape)
inputs = inputs.view(batch_size, seq_length, hidden_size)
print("Merge multi-head Shape:", inputs.shape)
Output:
Shape: torch.Size([16, 2, 512, 384]) Permute Shape: torch.Size([16, 512, 2, 384]) Merge multi-head Shape: torch.Size([16, 512, 768])
As we can see, the multi-head attention outputs have been successfully merged back together.
References
- RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces)
- GitHub - view size is not compatible with input tensor's ...