Skip to content

[Python] How To Compute The Memory Usage of Variables And Objects

As project development plans become increasingly concerned with execution speed and memory usage, it is clearly necessary for us to understand how much memory the variables and objects use in the program.

In Python, many people recommend to use the bulit-in sys.getsizeof() to return the memory of variables. But based on the Python implement limitation, we can not use this function to correctly analysis the actually memory usage.

For example, today I am using the HuggingFace transformers package, I instantiated a T5 tokenizer. But if you use sys,getsizeof() to get the related memory situation, you will find that the tokenizer does not even greater than the memory usage of an integer variable!

This is obviously contrary to our intuition.

There is a third-part package memory-profiler, it can help. At least when I wrote this post (2022-11-24), it worked correctly in my code. However, the developer has indicated that they are no longer actively maintaining it, but they welcome anyone who wants to take over maintenance to contact them.

Below, I will briefly record how to use this package to analyze the memory accessed by variables and objects.


How to use memory-profiler

First, we can use pip command to install it:

pip3 install memory-profiler


Let’s take a look with a sample code:

# coding: utf-8
from memory_profiler import profile
from transformers import AutoTokenizer


@profile
def main() -> None:
    # Init
    a = 1
    b = list(range(1000))
    tokenizer = AutoTokenizer.from_pretrained("t5-small")

    # Delete
    del a
    del b
    del tokenizer


if __name__ == "__main__":
    main()


Output:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     6     75.6 MiB     75.6 MiB           1   @profile
     7                                         def main() -> None:
     8                                             # Init
     9     75.6 MiB      0.0 MiB           1       a = 1
    10     75.6 MiB      0.0 MiB           1       b = list(range(1000))
    11    109.3 MiB     33.7 MiB           1       tokenizer = AutoTokenizer.from_pretrained("t5-small")
    12                                         
    13                                             # Delete
    14    109.3 MiB      0.0 MiB           1       del a
    15    109.3 MiB      0.0 MiB           1       del b
    16    106.4 MiB     -2.9 MiB           1       del tokenizer


As the above show, we can easily see the memory increasing of the program.


References


Read More

Tags:

Leave a Reply