Today if you are preprocessing some machine learning data, maybe you need to convert PyTorch tensor to one-hot encoding type. There is a intuitive method that is convert TENSOR to NUMPY-ARRAY, and then convert NUMPY-ARRAY to one-hot encoding type, just like this article: [Python] Convert the value to one-hot type in Numpy
But maybe you can consider to convert PyTorch Tensor to one-hot encoding type, directly. (https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507)
What is one-hot encoding
Before we starting, I want to introduce about what is one-hot encodng.
Assume we have a following array:
[1, 2, 3]
We convert the array to one-hot encoding type, it will look like:
[[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
Index is start from , one-hot encoding is the above type.
Use scatter_() to convert
Assume we have the following tensor:
y = torch.tensor([[1], [2], [3]]) print(y)
Output:
tensor([[1],
[2],
[3]])
If our label type is that, it means we use batch_size=3
to train our model, it’s rare, but we pretend it is real.
We initialize a all zero tensor:
batch_size = 3 length = 4 y_onehot = torch.zeros([batch_size, length]) print(y_onehot)
Output:
tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Since Index is always start with 0, when we have a Label with index = 3, we will have a matrix size of length = 4.
Then we use scatter_()
to convert it.
print(y_onehot.scatter_(1, y, 1))
Output:
tensor([[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
we done!
Pingback: Pytorch One Hot Encoding? All Answers - Barkmanoil.com