Last Updated on 2020-11-10 by Clay
Introduction
Whatever programming language we used, we always selected Json format to store our data.
Json (JavaScript Object Notation) is a lightweight data-interchange format, and it is readable and suitable for storing data. But in Python, sometimes we have performance bottlenecks in reading Json files, especially when the files are too large.
Today I want to record how to use a Python third-party package ujson to solve this problem.
The full name of ujson is UltraJson, which is a fast Json encoder and decoder developed in pure C language, suitable for Python 2.5+ and Python 3+.
The advantage of the ujson package is that its speed is much faster than the native Python speed. We can look at the statistics on PyPI (URL is here: https://pypi.org/project/ujson/ )
It can be seen from the table that ujson is faster than the native json package. However, number are after all number, if you use a tool, you must have tested it yourself.
Using ujson to read json file
The following is only a simple Json file reading test. After all, the part I hope to accelerate is often the part that the decoder reads in.
If you use ujson in first time, you can use the following command to install it.
sudo pip3 install ujson
After installing, we can use du command to show how many GB our test Json file have.
du --block-size=1G train_data.json
Output:
10 train_data.json
It is 10 GB.
Native json module
import time import json start=time.time() with open('train_data.json', 'r') as f: text = json.load(f) print(time.time()-start, 's')
Output:
191.61602115631104 s
usjon package
import time import ujson start=time.time() with open('train_data.json', 'r') as f: text = ujson.load(f) print(time.time()-start, 's')
Output:
110.23203444480896 s
The usage of ujson is almost the same as native json. It can also be seen that the reading speed has really improved.
Experience
(2020/03/29 Update) Since I started to replace the Json files in my work environment with “ujson”, there have been several minor problems. The most obvious one is that although the writing speed and reading speed of “ujson” are indeed much faster than those of native json, its memory consumption is even greater. I have several times because of OOM (Out Of Memory) ) Causes the program to report an error.
It is recommended that there is enough memory to pursue speed.