Multiprocessing.Pool: What is the difference between map_async and imap?

Question

Multiprocessing.Pool: What is the difference between map_async and imap?

I am trying to learn how to use the Python multiprocessing package, but I do not understand the difference between map_async and imap . I noticed that both map_async and imap are running asynchronously. So when should I use one over the other? And how do I get the result returned by map_async ?

Should I use something like this?

 def test(): result = pool.map_async() pool.close() pool.join() return result.get() result=test() for i in result: print i

+85

python multiprocessing python-multiprocessing

spacegoing Oct 23 '14 at 3:23

source share

1 answer

dano · Accepted Answer · 2014-10-23 04:51

There are two key differences between imap / imap_unordered and map / map_async :

The way they consume iterable, you pass them.
How they return the result to you.

map consumes your iterable by converting iterability to a list (if it is no longer a list), breaking it into pieces and sending these fragments to workflows in Pool . Interrupting an iteration into chunks is better than passing each element in an iterable between processes one element at a time, especially if the iterability is large. However, turning iterable into a list so that its piece, it can have a very high memory cost, since the entire list must be stored in memory.

imap does not turn the iterable that you pass into the list, and does not break it into pieces (by default). It will iterate over one iterable element at a time and send them to each workflow. This means that you are not doing memory capture by converting the whole iterable to a list, but it also means that performance for large iterations is slower due to lack of fragmentation. This can be mitigated by passing the chunksize argument more than the default value of 1.

Another significant difference between imap / imap_unordered and map / map_async is that with imap / imap_unordered you can start receiving results from workers as soon as they are ready, instead of waiting for them to be finished. With the help of map_async returned AsyncResult , but you cannot get results from this object until all of them have been processed, and at what points it will return the same list as map ( map actually implemented inside map_async(...).get() ). There is no way to get partial results; you either have the whole result, or nothing.

imap and imap_unordered immediately return iterators. With imap results will be retrieved from the iterable as soon as they are ready, while maintaining the order of input of the iteration. Using imap_unordered results will be obtained as soon as they are ready, regardless of the order of input of the iteration. So say you have this:

 import multiprocessing import time def func(x): time.sleep(x) return x + 2 if __name__ == "__main__": p = multiprocessing.Pool() start = time.time() for x in p.imap(func, [1,5,3]): print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))

This will output:

 3 (Time elapsed: 1s) 7 (Time elapsed: 5s) 5 (Time elapsed: 5s)

If you use p.imap_unordered instead of p.imap , you will see:

 3 (Time elapsed: 1s) 5 (Time elapsed: 3s) 7 (Time elapsed: 5s)

If you use p.map or p.map_async().get() , you will see:

 3 (Time elapsed: 5s) 7 (Time elapsed: 5s) 5 (Time elapsed: 5s)

So, the main reasons for using imap / imap_unordered over map_async are:

Iterability is great enough that converting it to a list will cause you to run out of space / use too much memory.
You want to start processing the results before they are all completed.

Multiprocessing.Pool: What is the difference between map_async and imap?

More articles: