Last Updated on 2022-07-25 by Clay
Python is a good language for data processing, and its package pandas provides many convenient functions. What I want to record today is to display a progress bar through the tqdm package when using pandas DataFrame iterrows()
to iterate over the data.
How to use
In the personal test, use iterrows()
to get data is faster than for-loop in the DataFrame operation.
You can use it like:
for index, row in df.iterrows():
# your code...
But, you cannot check the current progress if you are iterating, just can only force to print the index and length of data.
But the force method is not suite for Jupyter Notebook. The block of Jupyter Notebook have a space constraint. Of course we will want to use tqdm.
In fact, tqdm can display a progress bar for process of pandas DataFrame iteration. To put it bluntly, we can directly give a total
parameter to set how many pieces of data there are in total
After all, tqdm is also a long-established package, and it is quite flexible in this setting and application.
So, if we want to display a tqdm progress bar on a pandas DataFrame, we just need to write code like the following:
for index, row in tqdm(df.iterrows(), total=df.shape[0]):
# your code...
The data can be displayed successfully.
References
- https://stackoverflow.com/questions/47087741/use-tqdm-progress-bar-with-pandas
- https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas