Parallelism in python is not working properly

Question

Parallelism in python is not working properly

I developed the application on gae using python 2.7, the ajax call requests some data from the API, one request can take ~ 200 ms, however, when I open two browsers and make two requests in a very short time, they take more than twice , I tried to put everything in streams, but this did not work ... (this happens when the application is connected to the network, and not just on the dev server)

So, I wrote this simple test to find out if this is a problem in python in general (in case of busy waiting), here is the code and the result:

def work(): t = datetime.now() print threading.currentThread(), t i = 0 while i < 100000000: i+=1 t2 = datetime.now() print threading.currentThread(), t2, t2-t if __name__ == '__main__': print "single threaded:" t1 = threading.Thread(target=work) t1.start() t1.join() print "multi threaded:" t1 = threading.Thread(target=work) t1.start() t2 = threading.Thread(target=work) t2.start() t1.join() t2.join()

Result for mac os x, core i7 (4 cores, 8 threads), python2.7:

 single threaded: <Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:07.763146 <Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:13.091614 0:00:05.328468 multi threaded: <Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:13.091952 <Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:13.102250 <Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:29.221050 0:00:16.118800 <Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:29.237512 0:00:16.145560

This is pretty shocking !! if one thread takes 5 seconds to do this. I thought that starting two threads at the same time would take the same time to complete both tasks, but it takes almost three times as much time. This makes the whole idea of streaming useless, as it would be faster to make them consistent!

what I miss here.

+7

python multithreading google-app-engine python-2.7 python-multithreading

Mohamed khamis Dec 6 '11 at 17:01

source share

3 answers

CPython will not allow more than one thread to run. read about GIL http://wiki.python.org/moin/GlobalInterpreterLock

Thus, certain tasks cannot be performed simultaneously in an efficient manner in CPython with threads.

If you want to do something parallel in GAE, run them in parallel with individual queries.

Alternatively, you can refer to the Python parallel wiki http://wiki.python.org/moin/ParallelProcessing

+4

bpgergo Dec 6 '11 at 17:09

source share

I would see where the time goes. Suppose, for example, that a server can only respond to one request every 200 ms. Then you can’t do anything, you will get only one response every 200 ms, because the whole server can provide you.

+1

David schwartz Dec 6 '11 at 17:09

source share

unutbu · Accepted Answer · 2011-12-06T17:22:48+0000

David Basley talked about this issue in PyCon 2010 . As mentioned earlier, for some tasks, the use of streaming processing, especially with multiple cores, can lead to lower performance than a single task performed by a single thread. Basley's problem is related to several cores with "GIL battle" :

To avoid GIL competition, you can get better results when tasks are performed in separate processes instead of separate threads. The multiprocessing module provides a convenient way to do this, especially since the multiprocessing API is very similar to the threading API.

 import multiprocessing as mp import datetime as dt def work(): t = dt.datetime.now() print mp.current_process().name, t i = 0 while i < 100000000: i+=1 t2 = dt.datetime.now() print mp.current_process().name, t2, t2-t if __name__ == '__main__': print "single process:" t1 = mp.Process(target=work) t1.start() t1.join() print "multi process:" t1 = mp.Process(target=work) t1.start() t2 = mp.Process(target=work) t2.start() t1.join() t2.join()

gives

 single process: Process-1 2011-12-06 12:34:20.611526 Process-1 2011-12-06 12:34:28.494831 0:00:07.883305 multi process: Process-3 2011-12-06 12:34:28.497895 Process-2 2011-12-06 12:34:28.503433 Process-2 2011-12-06 12:34:36.458354 0:00:07.954921 Process-3 2011-12-06 12:34:36.546656 0:00:08.048761

PS. As zeekay noted in the comments, the GIL battle is a serious problem for CPU related tasks. This should not be a problem for IO related tasks.

Parallelism in python is not working properly

More articles: