How to use a Python Pool.map multiprocessor to populate a numpy array in a for loop

I want to populate a 2D-numpy array in a for loop and consolidate the calculation using multiprocessing.

import numpy from multiprocessing import Pool array_2D = numpy.zeros((20,10)) pool = Pool(processes = 4) def fill_array(start_val): return range(start_val,start_val+10) list_start_vals = range(40,60) for line in xrange(20): array_2D[line,:] = pool.map(fill_array,list_start_vals) pool.close() print array_2D 

The effect of its execution is that Python starts 4 subprocesses and takes 4 processor cores, but the execution does not end and the array is not printed. If I try to write an array to disk, nothing will happen.

Can someone tell me why?

+8
python arrays numpy multiprocessing pool
source share
3 answers

The following work. First of all, it is a good idea to protect the main part of your code inside the main block in order to avoid strange side effects. The result of poo.map() is a list containing scores for each value in the list_start_vals iterator, so you don't need to create array_2D before.

 import numpy as np from multiprocessing import Pool def fill_array(start_val): return list(range(start_val, start_val+10)) if __name__=='__main__': pool = Pool(processes=4) list_start_vals = range(40, 60) array_2D = np.array(pool.map(fill_array, list_start_vals)) pool.close() # ATTENTION HERE print array_2D 

you might have problems using pool.close() , from @hpaulj's comments you can just remove this line if you have problems ...

+1
source share

If you still want to use array padding, you can use pool.apply_async instead of pool.map . Reply from Saullo:

 import numpy as np from multiprocessing import Pool def fill_array(start_val): return range(start_val, start_val+10) if __name__=='__main__': pool = Pool(processes=4) list_start_vals = range(40, 60) array_2D = np.zeros((20,10)) for line, val in enumerate(list_start_vals): result = pool.apply_async(fill_array, [val]) array_2D[line,:] = result.get() pool.close() print array_2D 

This runs a bit slower than map . But it does not create a runtime error like my map version test: Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored

+1
source share

The problem is starting the pool.map in for loop. The result of the map () method is functionally equivalent to the built-in map (), except that separate tasks are performed in parallel. so in your case pool.map (fill_array, list_start_vals) will be called 20 times and will start working in parallel for each iteration of the for loop, Below the code should work

The code:

 #!/usr/bin/python import numpy from multiprocessing import Pool def fill_array(start_val): return range(start_val,start_val+10) if __name__ == "__main__": array_2D = numpy.zeros((20,10)) pool = Pool(processes = 4) list_start_vals = range(40,60) # running the pool.map in a for loop is wrong #for line in xrange(20): # array_2D[line,:] = pool.map(fill_array,list_start_vals) # get the result of pool.map (list of values returned by fill_array) # in a pool_result list pool_result = pool.map(fill_array,list_start_vals) # the pool is processing its inputs in parallel, close() and join() #can be used to synchronize the main process #with the task processes to ensure proper cleanup. pool.close() pool.join() # Now assign the pool_result to your numpy for line,result in enumerate(pool_result): array_2D[line,:] = result print array_2D 
0
source share

All Articles