Skip to content

[Solved][PyTorch] RuntimeError: CUDA out of memory. Tried to allocate 2.0 GiB

Today I want to record a common problem, its solution is very rarely. Simple to put, the error message as follow:

RuntimeError: CUDA out of memory. Tried to allocate 2.0 GiB.

This error is actually very simple, that is your memory of GPU is not enough, causing the training data we want to train in the GPU to be insufficiently stored, causing the program to stop unexpectedly.

For Linux, the memory capacity seen with nvidia-smi command is the memory of GPU; while the memory seen with htop command is the memory normally stored in the computer for executing programs, the two are different.


Solution

If you encounter this problem during data training, it is usually the problem of too large Batch Size. Just imagine: Giving a huge amount of data to the GPU at a time, is it easy for the memory to overflow?

Conversely, if the data lost at a time is smaller, and then it is cleared after training, and the next batch of data comes in, it can avoid GPU overflow.

So if it is the training phase, reducing the Batch Size is a method that can be considered.

But if you have the problem during the testing, it may be because the gradient of the model is still accumulating.

In PyTorch, we need to change the model mode to eval() mode, and put the model testing under the with torch.no_grad().

In this way, the model does not accumulate gradients.


References


Read More

Leave a ReplyCancel reply

Click to Copy
Exit mobile version