Need help for loop parallelization in python

Question

Need help for loop parallelization in python

I have a huge data set, and I have to calculate for each point its series of properties. My code is very slow and I would like it to parallelize the do loop faster. I would like each processor to compute a “series of properties” for a limited subsample of my data, and then combine all the properties together in one array. I will try to explain what I have to do with an example.

Say my dataset is an x array:

 x = linspace(0,20,10000)

The “property” I want to get is, for example, the square root of x :

 prop=[] for i in arange(0,len(x)): prop.append(sqrt(x[i]))

The question is, how can I parallelize the above loop? Suppose I have 4 processors, and I would like each of them to calculate sqrt from 10000/4 = 2500 points.

I tried to look at some python modules, such as multiprocessing and mpi4py , but from the manuals I could not find the answer to such a simple question.

edits

I thank you all for the precious comments and links that you provided me. However, I would like to clarify my question. I am not interested in the sqrt function. I perform a series of operations in a loop. I know very well that the loops are bad, and vector operation is always preferable for them, but in this case I really need to do a loop. I will not go into the details of my problem, because it will add an unnecessary complication to this issue. I would like to split my loop so that each processor does part of it, which means that I could run my code 40 times with 1/40 loop each and merge the result, but that would be stupid. This is a short example.

  for i in arange(0,len(x)): # do some complicated stuff

I want to use 40 cpus for this:

  for npcu in arange(0,40): for i in arange(len(x)/40*ncpu,len(x)/40*(ncpu+1)): # do some complicated stuff

Is this possible with python?

+4

python numpy parallel-processing multiprocessing

Brian Jun 26 '12 at 15:25

source share

3 answers

Parallelization is not trivial, however you can find numexpr .

For numerical work , you really need to look into the numpy utility, which gives you ( vectorize and the like), this usually gives you good acceleration as the basis for the work.

For more complex, non-numerical cases, you can use multiprocessing (see comments).

In the sidestream, multithreading is even more non-trivial with python than with other languages, is that CPython has a Global Interpreter Lock (GIL) , which prohibits the simultaneous execution of two sections of python code in the same interpreter (i.e. there is no real multithreaded clean python code). For I / O and heavy computing, third-party libraries, however, tend to release this lock, so limited multithreading is possible.

This adds to the usual multithreaded inconvenience associated with accessing shared mutexes and the like.

+3

Jonas wielicki Jun 26 '12 at 15:28

source share

I would advise you to take a look at cython: http://www.cython.org/

It allows you to quickly create a c-extension for python and integrates very well with numpy. Here is a good tutorial that can help you get started: http://docs.cython.org/src/tutorial/numpy.html

0

Simon Jun 26 '12 at 16:44

source share

Nolen Royalty · Accepted Answer · 2012-06-26T15:51:43+0000

I'm not sure if you should do this the way I would expect numpy to have a much more efficient method for this, but do you just mean something like this?

 import numpy import multiprocessing x = numpy.linspace(0,20,10000) p = multiprocessing.Pool(processes=4) print p.map(numpy.sqrt, x)

Below are the timeit results for both solutions. However, as @SvenMarcach notes, with a more expensive feature, multiprocessing will become much more efficient.

 % python -m timeit -s 'import numpy; x=numpy.linspace(0,20,10000)' 'prop=[] for i in numpy.arange(0,len(x)): prop.append(numpy.sqrt(x[i]))' 10 loops, best of 3: 31.3 msec per loop % python -m timeit -s 'import numpy, multiprocessing; x=numpy.linspace(0,20,10000) p = multiprocessing.Pool(processes=4)' 'l = p.map(numpy.sqrt, x)' 10 loops, best of 3: 102 msec per loop

In the Sven request, here is the result l = numpy.sqrt(x) , which is significantly faster than any of the alternatives.

 % python -m timeit -s 'import numpy; x=numpy.linspace(0,20,10000)' 'l = numpy.sqrt(x)' 10000 loops, best of 3: 70.3 usec per loop

Need help for loop parallelization in python

More articles: