Using Python multiprocessing.pool.map to control the same integer

Problem

I am using the Python multiprocessing module to execute functions asynchronously. What I want to do is keep track of the overall progress of my script, as each process calls and executes def add_print . For example, I would like the code below to add 1 to total and print the value ( 1 2 3 ... 18 19 20 ) every time the process runs this function. My first attempt was to use a global variable, but that didn't work. Since the function is called asynchronously, each process reads total as 0 to start with and adds 1 independently of the other processes. Thus, the output is 20 1 instead of increasing the values.

How can I go to the same memory block from my displayed function synchronously, although the function is executed asynchronously? One of my ideas was to somehow cache total in memory, and then reference the exact block of memory when I add to total . Is this a possible and fundamentally sound approach in python?

Please let me know if you need more information, or if I do not explain something well enough.

Thanks!


code

 #!/usr/bin/python ## Import builtins from multiprocessing import Pool total = 0 def add_print(num): global total total += 1 print total if __name__ == "__main__": nums = range(20) pool = Pool(processes=20) pool.map(add_print, nums) 
+3
source share
1 answer

You can use a common Value :

 import multiprocessing as mp def add_print(num): total.value += 1 print(total.value) def setup(t): global total total = t if __name__ == "__main__": total = mp.Value('i', 0) nums = range(20) pool = mp.Pool(initializer=setup, initargs=[total]) pool.map(add_print, nums) 

The pool initializer calls setup once for each working subprocess. setup makes total global variable in the workflow, so total can be accessed inside add_print when the worker calls add_print .

Please note that the number of processes should not exceed the number of processors that your computer has. If you do this, redundant subprocesses will wait until the CPUs become available. Therefore, do not use processes=20 unless you have 20 or more processors. If you do not specify the processes argument, multiprocessing will determine the number of available CPUs and create a pool with many workers for you. The number of tasks (for example, length nums ) usually significantly exceeds the number of processors. It's great; tasks are queued and processed by one of the workers when the worker becomes available.

+4
source

All Articles