Skip to content

[PyTorch] Convert Tensor to One-Hot Encoding Type

Today if you are preprocessing some machine learning data, maybe you need to convert PyTorch tensor to one-hot encoding type. There is a intuitive method that is convert TENSOR to NUMPY-ARRAY, and then convert NUMPY-ARRAY to one-hot encoding type, just like this article: [Python] Convert the value to one-hot type in Numpy

But maybe you can consider to convert PyTorch Tensor to one-hot encoding type, directly. (https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507)


What is one-hot encoding

Before we starting, I want to introduce about what is one-hot encodng.

Assume we have a following array:

[1, 2, 3]


We convert the array to one-hot encoding type, it will look like:

[[0, 1, 0, 0],
 [0, 0, 1, 0],
 [0, 0, 0, 1]]

Index is start from , one-hot encoding is the above type.


Use scatter_() to convert

Assume we have the following tensor:

y = torch.tensor([[1], [2], [3]])
print(y)



Output:

tensor([[1],
        [2],
        [3]])

If our label type is that, it means we use batch_size=3 to train our model, it’s rare, but we pretend it is real.

We initialize a all zero tensor:

batch_size = 3
length = 4
y_onehot = torch.zeros([batch_size, length])
print(y_onehot)



Output:

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

Since Index is always start with 0, when we have a Label with index = 3, we will have a matrix size of length = 4.

Then we use scatter_() to convert it.

print(y_onehot.scatter_(1, y, 1))



Output:

tensor([[0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])

we done!

1 thought on “[PyTorch] Convert Tensor to One-Hot Encoding Type”

  1. Pingback: Pytorch One Hot Encoding? All Answers - Barkmanoil.com

Leave a Reply