I have a huge data set, and I have to calculate for each point its series of properties. My code is very slow and I would like it to parallelize the do loop faster. I would like each processor to compute a “series of properties” for a limited subsample of my data, and then combine all the properties together in one array. I will try to explain what I have to do with an example.
Say my dataset is an x array:
x = linspace(0,20,10000)
The “property” I want to get is, for example, the square root of x :
prop=[] for i in arange(0,len(x)): prop.append(sqrt(x[i]))
The question is, how can I parallelize the above loop? Suppose I have 4 processors, and I would like each of them to calculate sqrt from 10000/4 = 2500 points.
I tried to look at some python modules, such as multiprocessing and mpi4py , but from the manuals I could not find the answer to such a simple question.
edits
I thank you all for the precious comments and links that you provided me. However, I would like to clarify my question. I am not interested in the sqrt function. I perform a series of operations in a loop. I know very well that the loops are bad, and vector operation is always preferable for them, but in this case I really need to do a loop. I will not go into the details of my problem, because it will add an unnecessary complication to this issue. I would like to split my loop so that each processor does part of it, which means that I could run my code 40 times with 1/40 loop each and merge the result, but that would be stupid. This is a short example.
for i in arange(0,len(x)):
I want to use 40 cpus for this:
for npcu in arange(0,40): for i in arange(len(x)/40*ncpu,len(x)/40*(ncpu+1)):
Is this possible with python?