I have a sum that I am trying to calculate, and I am having difficulty parallelizing the code. The calculation I'm trying to parallelize is quite complicated (it uses both numpy arrays and scipy sparse matrices). It splashes out a numpy array, and I want to sum the output arrays from about 1000 calculations. Ideally, I would keep the current amount for all iterations. However, I could not figure out how to do this.
So far, I have been trying to use the joblib parallel function and the pool.map function with the python multiprocessing package. For both of them, I use an internal function that returns a numpy array. These functions return a list, which I convert to a numpy array and then summarize.
However, after the joblib Parallel function completes all iterations, the main program will never work (it seems that the original job is in a standby state using 0% CPU). When I use pool.map, I get memory errors after all iterations are complete.
Is there a way to just parallelize the current sum of arrays?
Change The goal is to do something like the following, except in parallel.
def summers(num_iters): sumArr = np.zeros((1,512*512))
python numpy scipy parallel-processing sum
Kevin
source share