If you use pool.map_async , you can extract this information from the MapResult instance that will be returned. For example:
import multiprocessing import time def worker(i): time.sleep(i) return i if __name__ == "__main__": pool = multiprocessing.Pool() result = pool.map_async(worker, range(15)) while not result.ready(): print("num left: {}".format(result._number_left)) time.sleep(1) real_result = result.get() pool.close() pool.join()
Output:
num left: 15 num left: 14 num left: 13 num left: 12 num left: 11 num left: 10 num left: 9 num left: 9 num left: 8 num left: 8 num left: 7 num left: 7 num left: 6 num left: 6 num left: 6 num left: 5 num left: 5 num left: 5 num left: 4 num left: 4 num left: 4 num left: 3 num left: 3 num left: 3 num left: 2 num left: 2 num left: 2 num left: 2 num left: 1 num left: 1 num left: 1 num left: 1
multiprocessing internally breaks the iterability that you pass to map in pieces, and passes each piece to child processes. Thus, the _number_left attribute really keeps track of the number of remaining blocks, rather than individual items in an iterable. Keep this in mind if you see strange numbers when using large iterations. It uses chunking to improve IPC performance, but if accurate counting of completed results is more important to you than added performance, you can use the chunksize=1 keyword for map_async to make _num_left more accurate. ( chunksize usually only makes a noticeable performance difference for very large iterations. Try to see for yourself if this really matters with your usecase).
As you mentioned in the comments, since pool.map blocking, you cannot get this unless you want to run the background thread that polled while the main thread was blocked in the map call, but I'm not sure what is Either benefit from this than with the above approach.
Another thing to keep in mind is that you are using the MapResult internal attribute, so it is possible that this might break in future versions of Python.
dano
source share