Does multiprocessing.pool.imap have an option (e.g. starmap) that allows multiple arguments?

I do some calculations in large sets of bytes. The process runs on chunks of bytes. I am trying to use parallel processing using multiprocessing to improve performance. I initially tried to use pool.map, but this only allows one argument, then I found about pool.starmap. But pool.starmap gives results only after completion of all processes. I need results when they come (sort of). I am trying to use pool.imap, which provides results at the end of processes, but does not allow multiple arguments (my function requires 2 arguments). In addition, the sequence of results is important.

Sample code below:

pool = mp.Pool(processes=4) y = [] for x in pool.starmap(f, zip(da, repeat(db))): y.append(x) 

The above code works, but gives results only after all processes have completed. I do not see any progress. This is why I tried using pool.imap, it works well, but with a single argument:

 pool = mp.Pool(processes=4) y = [] for x in pool.imap(f, da)): y.append(x) 

Several arguments raise the following exception:

 TypeError: f() missing 1 required positional argument: 'd' 

Looking for an easy way to achieve all 3 requirements:

  • parallel processing using multiple parameters / arguments
  • will be able to see the progress during the work processes
  • ordered results.

Thanks!

+5
source share
2 answers

I can answer the first two questions pretty quickly. I think you should be able to deal with the third question after understanding the first two.

1. Parrallel processing with several arguments

I am not sure about this equivalent of "starmap", but here is an alternative. What I did in the past reduces my arguments to a single data object, such as a list. For example, if you want to pass three arguments to your map_function , you can add these arguments to the list, and then use the list with the .map() or .imap() function.

 def map_function(combo): a = combo[0] b = combo[1] c = combo[2] return a + b + c if '__name__' == '__main__': combo = [] combo[0] = arg_1 combo[1] = arg_2 combo[2] = arg_3 pool = Pool(processes=4) pool.map(map_function, combo) 

2. Tracking progress

A good way to do this is to use multiprocessing shared value. I really asked this (almost) exact same question about a month ago. This allows you to manipulate the same variable from different processes created by your map function. For the sake of training, I will allow you to read and understand the solution to the general condition on your own. If you still have problems after several attempts, I will be more than happy to help you, but I believe that to teach myself to understand something is much more valuable than me, giving you an answer.

Hope this helps!

+2
source

I think this solution exactly matches your 3 requirements: fooobar.com/questions/706917 / ...

In short, p = Pool(); p.imap p = Pool(); p.imap will allow you to see progress and maintain order. If you need map functions with multiple arguments, you can use fork multiprocessing , which provides better serialization and multiple arguments. See Link for an example.

+1
source

All Articles