Parallelism in python is not working properly

I developed the application on gae using python 2.7, the ajax call requests some data from the API, one request can take ~ 200 ms, however, when I open two browsers and make two requests in a very short time, they take more than twice , I tried to put everything in streams, but this did not work ... (this happens when the application is connected to the network, and not just on the dev server)

So, I wrote this simple test to find out if this is a problem in python in general (in case of busy waiting), here is the code and the result:

def work(): t = datetime.now() print threading.currentThread(), t i = 0 while i < 100000000: i+=1 t2 = datetime.now() print threading.currentThread(), t2, t2-t if __name__ == '__main__': print "single threaded:" t1 = threading.Thread(target=work) t1.start() t1.join() print "multi threaded:" t1 = threading.Thread(target=work) t1.start() t2 = threading.Thread(target=work) t2.start() t1.join() t2.join() 

Result for mac os x, core i7 (4 cores, 8 threads), python2.7:

 single threaded: <Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:07.763146 <Thread(Thread-1, started 4315942912)> 2011-12-06 15:38:13.091614 0:00:05.328468 multi threaded: <Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:13.091952 <Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:13.102250 <Thread(Thread-3, started 4323282944)> 2011-12-06 15:38:29.221050 0:00:16.118800 <Thread(Thread-2, started 4315942912)> 2011-12-06 15:38:29.237512 0:00:16.145560 

This is pretty shocking !! if one thread takes 5 seconds to do this. I thought that starting two threads at the same time would take the same time to complete both tasks, but it takes almost three times as much time. This makes the whole idea of ​​streaming useless, as it would be faster to make them consistent!

what I miss here.

+7
source share
3 answers

David Basley talked about this issue in PyCon 2010 . As mentioned earlier, for some tasks, the use of streaming processing, especially with multiple cores, can lead to lower performance than a single task performed by a single thread. Basley's problem is related to several cores with "GIL battle" :

enter image description here

To avoid GIL competition, you can get better results when tasks are performed in separate processes instead of separate threads. The multiprocessing module provides a convenient way to do this, especially since the multiprocessing API is very similar to the threading API.

 import multiprocessing as mp import datetime as dt def work(): t = dt.datetime.now() print mp.current_process().name, t i = 0 while i < 100000000: i+=1 t2 = dt.datetime.now() print mp.current_process().name, t2, t2-t if __name__ == '__main__': print "single process:" t1 = mp.Process(target=work) t1.start() t1.join() print "multi process:" t1 = mp.Process(target=work) t1.start() t2 = mp.Process(target=work) t2.start() t1.join() t2.join() 

gives

 single process: Process-1 2011-12-06 12:34:20.611526 Process-1 2011-12-06 12:34:28.494831 0:00:07.883305 multi process: Process-3 2011-12-06 12:34:28.497895 Process-2 2011-12-06 12:34:28.503433 Process-2 2011-12-06 12:34:36.458354 0:00:07.954921 Process-3 2011-12-06 12:34:36.546656 0:00:08.048761 

PS. As zeekay noted in the comments, the GIL battle is a serious problem for CPU related tasks. This should not be a problem for IO related tasks.

+9
source

CPython will not allow more than one thread to run. read about GIL http://wiki.python.org/moin/GlobalInterpreterLock

Thus, certain tasks cannot be performed simultaneously in an efficient manner in CPython with threads.

If you want to do something parallel in GAE, run them in parallel with individual queries.

Alternatively, you can refer to the Python parallel wiki http://wiki.python.org/moin/ParallelProcessing

+4
source

I would see where the time goes. Suppose, for example, that a server can only respond to one request every 200 ms. Then you can’t do anything, you will get only one response every 200 ms, because the whole server can provide you.

+1
source

All Articles