I do some calculations in large sets of bytes. The process runs on chunks of bytes. I am trying to use parallel processing using multiprocessing to improve performance. I initially tried to use pool.map, but this only allows one argument, then I found about pool.starmap. But pool.starmap gives results only after completion of all processes. I need results when they come (sort of). I am trying to use pool.imap, which provides results at the end of processes, but does not allow multiple arguments (my function requires 2 arguments). In addition, the sequence of results is important.
Sample code below:
pool = mp.Pool(processes=4) y = [] for x in pool.starmap(f, zip(da, repeat(db))): y.append(x)
The above code works, but gives results only after all processes have completed. I do not see any progress. This is why I tried using pool.imap, it works well, but with a single argument:
pool = mp.Pool(processes=4) y = [] for x in pool.imap(f, da)): y.append(x)
Several arguments raise the following exception:
TypeError: f() missing 1 required positional argument: 'd'
Looking for an easy way to achieve all 3 requirements:
- parallel processing using multiple parameters / arguments
- will be able to see the progress during the work processes
- ordered results.
Thanks!
source share