Skip to content

[Python] How To Use Multiprocessing Pool And Display Progress Bar

Python is a popular, easy and elegant programming language, its performance has always been criticized by user of other programming. So in terms of data pre-processing, it is very important to use multi-threading and multi-processing.

What I want to record today is how to use the pool process in python. In multi-core CPUs, the utilization is often higher than simply using threading, and the program will not crash due to a certain process death.

I heard that such a situation will be encountered in threading, but I have not encountered it personally.

It is worth noting that you should be careful about the problem of accessing the same file and variables between multiple processes.

If the situation permits, it is recommended to use the return value of the task to unify the data in order to face the problem of missing data.


How To Use Pool

First of all, multiprocessing is a native python package and does not require additional installation. In addition, we need to write the task that we want to multi-processing as a function.

The following is the simplest example program:

# coding: utf-8
import multiprocessing as mp


# Task
def task(item):
    return item % 10


if __name__ == '__main__':
    pool = mp.Pool(processes=4)
    inputs = range(10)

    results = pool.map(task, inputs)
    print(results)



Output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

First we need to use:

pool = mp.Pool(processes=4)



And we can create a process pool. Among them, processes represents the number of CPU cores. If there is no setting, all cores of the system will be used by default. Then use:

results = pool.map(task, inputs)



Among them, input is python iterable object, which will input each iteration element into the task() function we defined for processing, and process tasks in parallel according to the set number of CPU cores to improve task efficiency.

And results is the return value after all tasks are completed.

The above is the simplest python pool program.


Display Progress Bar

Sometimes, if our task is very large, we often need to progress bar so that we can confirm that the program is still running normally at any time. Then you can refer to the following approach.

First of all, it is recommended to install tqdm, a Python package that visualizes iterations:

pip3 install tqdm

Then, change the program to:

# coding: utf-8
import multiprocessing as mp
import tqdm


# Task
def task(item):
    return item % 10


if __name__ == '__main__':
    pool = mp.Pool(processes=4)
    inputs = range(10)

    results = []
    for result in tqdm.tqdm(pool.imap_unordered(task, inputs), total=len(inputs)):
        results.append(result)
       
    print(results)



Output:

100%|████████████████████████████████████████████| 10/10 [00:00<00:00, 20877.57it/s]
[0, 1, 2, 4, 3, 5, 6, 7, 8, 9]

In this way, we can enjoy the high-speed processing of multiple processing while being able to clearly see the current progress.


References


Read More

Tags:

Leave a Reply