Skip to content

[Python] How to Avoid Memory Error when using open() to open a large file

When we using python for data. analysis, especially natural language processing tasks, it is difficult to avoid dealing with some large scale file.

This time, if you always use open() function to open these big files, you may get some memory error messages.

This is because the open() function in python loads whole files into memory by default.

So, how can we change our approach? The more common methods are:

  1. Use with to load the file
  2. Use read([size]) to control a chunk of file reading

In this way, we can avoid memory errors!

Oh, another way is to buy more memory.


Restore the situation where the error was reported

First of all, let me record the situation where the error may be reported:

text = open('data.txt', 'r', encoding='utf-8').read().split('\n')

for line in text:
    print(line)



When dealing with small files, I personally find the method here to be quite convenient. After all, the "text" when it comes in is stored line by line of sentences. However, when dealing with large files, this will load all the data into the memory at once, putting a relatively large burden on the memory.


Use with to open file

with open('text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        print(line)



We can see that when reading large files, this opening method is obviously much faster and will not report errors.


Use read([size]) to open file

So since it is good to use with to open file, why use read([size])?

This is because, sometimes there may not be a "line break" in our text. What's bad is that reading with with is no different from reading all in one go.

Because of this possibility, it is necessary to use read([size]) to control the number of reads at a time when necessary.

with open('text.txt', 'r', encoding='utf-8') as f:
    for chunk in iter(lambda: f.read(1024), ''):
        print(chunk)




Read More

Tags:

Leave a Reply