Last Updated on 2023-12-12 by Clay
Problem
Last night I tried to improve some code about merge two models. Because my poor device, I cannot merge all layers totally, but need to merge them layer by layer for reducing my memory cost.
I found how to free the memory of GPU is very easy, but CPU is another story. I delete the model, but I cannot free my memory.
Solution
First we discuss about the GPU memory release.
model.to("cuda:0")
del model
torch.cuda.empty_cache()
We just need one line and we can free the VRAM. Of course, it can't work on CPU.
When model is allocate the CPU memory, many people recommend to use the following code to free:
import gc
del model
gc.collect()
This method has only worked once in the past day. Since I need to submit code that's reliable enough, I can't gamble on luck.
Some people share their thinking, to wrap the model to be a wrapper and delete it. The lambda
method doesn't work but List
does!
import gc
from transformers import AutoModelForCausalLM
# Init
models = []
models.append(AutoModelForCausalLM.from_pretrained("gpt2"))
# Operation
models[0].generate(...)
# Delete
del models
gc.collect()
This method I verify many times, and it work smoothy.
Later, when I was optimizing my code, I accidentally discovered another way to clear the memory, which is to throw it on the meta
device:
model.to("meta")
This way would be build a null model structure without any weight, it generally used for some specific goal, but help me a lot.
The above are two methods of releasing the model memory in the CPU.