Counting the total number of tasks performed by multiprocessing.

I would like to give a general idea of ​​the current conversation in which we just. I do agriculture and want to know about the current progress. Therefore, if I sent jobs 100 to processors 10 , how can I show what the current number of jobs returned is. I can get the id, but how to count the number of completed jobs returned from my map function.

I call my function as follows:

 op_list = pool.map(PPMDR_star, list(varg)) 

And in my function, I can print the current name

 current = multiprocessing.current_process() print 'Running: ', current.name, current._identity 
+7
python parallel-processing multiprocessing
source share
1 answer

If you use pool.map_async , you can extract this information from the MapResult instance that will be returned. For example:

 import multiprocessing import time def worker(i): time.sleep(i) return i if __name__ == "__main__": pool = multiprocessing.Pool() result = pool.map_async(worker, range(15)) while not result.ready(): print("num left: {}".format(result._number_left)) time.sleep(1) real_result = result.get() pool.close() pool.join() 

Output:

 num left: 15 num left: 14 num left: 13 num left: 12 num left: 11 num left: 10 num left: 9 num left: 9 num left: 8 num left: 8 num left: 7 num left: 7 num left: 6 num left: 6 num left: 6 num left: 5 num left: 5 num left: 5 num left: 4 num left: 4 num left: 4 num left: 3 num left: 3 num left: 3 num left: 2 num left: 2 num left: 2 num left: 2 num left: 1 num left: 1 num left: 1 num left: 1 

multiprocessing internally breaks the iterability that you pass to map in pieces, and passes each piece to child processes. Thus, the _number_left attribute really keeps track of the number of remaining blocks, rather than individual items in an iterable. Keep this in mind if you see strange numbers when using large iterations. It uses chunking to improve IPC performance, but if accurate counting of completed results is more important to you than added performance, you can use the chunksize=1 keyword for map_async to make _num_left more accurate. ( chunksize usually only makes a noticeable performance difference for very large iterations. Try to see for yourself if this really matters with your usecase).

As you mentioned in the comments, since pool.map blocking, you cannot get this unless you want to run the background thread that polled while the main thread was blocked in the map call, but I'm not sure what is Either benefit from this than with the above approach.

Another thing to keep in mind is that you are using the MapResult internal attribute, so it is possible that this might break in future versions of Python.

+13
source share

All Articles