Last Updated on 2024-01-15 by Clay
Problem
Yesterday, I developed a model merging program. This time I have no enough gpu memory to merge the models in only one time, so I tried to merge layer by layer. I found the memory of GPU is easily to release but CPU didn't.
I searched on Internet, and only one solution is useful. I found another method is suit for me, so record them on below.
Solutions
First, we record the GPU memory releasing.
model.to("cuda:0")
del model
torch.cuda.empty_cache()
We just need few lines to do it, but it can not use on CPU.
In CPU scenario, many people recommend to do:
import gc
del model
gc.collect()
This method has only worked once in the past day. But since I need to submit code that's reliable enough, I can't gamble on luck.
Some people shared that it is effective to wrap the model into a wrapper and then delete it again; I tried the lambda
they recommended but it didn't work for me; and by chance I saw the practice of using List
to wrap it and then delete it. But it worked!
import gc
from transformers import AutoModelForCausalLM
# Init
models = []
models.append(AutoModelForCausalLM.from_pretrained("gpt2"))
# Operation
models[0].generate(...)
# Delete
del models
gc.collect()
I have tried this method many times, and it clears the memory normally every time.
Later, when I was optimizing my code, I accidentally discovered another way to clear the memory, which is to throw it on the meta device:
model.to("meta")
This way may create a null model architecture without any weights. It used for special goal generally, but in this time it help a lot.