Python multithreading versus multiprocessing and sequential execution

I have the code below:

import time from threading import Thread from multiprocessing import Process def fun1(): for _ in xrange(10000000): print 'in fun1' pass def fun2(): for _ in xrange(10000000): print 'in fun2' pass def fun3(): for _ in xrange(10000000): print 'in fun3' pass def fun4(): for _ in xrange(10000000): print 'in fun4' pass if __name__ == '__main__': #t1 = Thread(target=fun1, args=()) t1 = Process(target=fun1, args=()) #t2 = Thread(target=fun2, args=()) t2 = Process(target=fun2, args=()) #t3 = Thread(target=fun3, args=()) t3 = Process(target=fun3, args=()) #t4 = Thread(target=fun4, args=()) t4 = Process(target=fun4, args=()) t1.start() t2.start() t3.start() t4.start() start = time.clock() t1.join() t2.join() t3.join() t4.join() end = time.clock() print("Time Taken = ",end-start) ''' start = time.clock() fun1() fun2() fun3() fun4() end = time.clock() print("Time Taken = ",end-start) ''' 

I ran the above program in three ways:

  • First consecutive execution of ALONE (view the commented code and comment the top code)
  • Second threading ALONE
  • Third execution of ALONE multiprocessing

End_time startup time observations are as follows:

Total operating time

  • ('Time Taken =', 342.5981313667716 ) --- Duration of execution of streaming execution
  • ('Time Taken =', 232.94691744899296 ) --- Runtime sequential execution
  • ('Time Taken =', 307.91093406618216 ) --- Runtime Multiprocessing

Question:

I see that sequential execution takes the least time, and multithreading takes the most time. What for? I can not understand, and also surprised by the results. Please specify.

Since this is an intensive processor task, and GIL is acquired, I understand that Multiprocessing takes the least time, while threading takes the most time. Please confirm your understanding.

+5
source share
2 answers

You use time.clock , which gave you CPU time, not real-time: you cannot use this in your case, since it gives you runtime (how long did you use the CPU to run your code, which will be almost in one and the same time for each of these cases)

Running the code using time.time() instead of time.clock gave me this time on my computer:

 Process : ('Time Taken = ', 5.226783990859985) seq : ('Time Taken = ', 6.3122560000000005) Thread : ('Time Taken = ', 17.10062599182129) 

The task given here (printing) is so great that accelerating the use of multiprocessing is almost balanced by costs.

For Threading , since you only have one thread running because of the GIL, you end up performing all your functions sequentially, but you had overhead on the threads (changing threads every few iterations can cost up to several milliseconds each time). This way you get something much slower.

Threading is useful if you have a timeout, so you can run tasks between them.

Multiprocessing is useful for calculating costly tasks, if possible, completely independent (without common variables). If you need to exchange variables, you will have to run into the GIL, and this is a bit more complicated (but not ruled out most of the time).

EDIT: Actually, using time.clock , like you, you provided information on how much overhead Threading and Multiprocessing are using.

+6
source

In principle, you are right. What platform do you use to run the code snippet? I think Windows. Note that โ€œprintโ€ is not CPU bound , so you should comment on โ€œprintingโ€ and try to run it on Linux to see the difference (this should be what you expect). Use the following code:

 def fun1(): for _ in xrange(10000000): # No print, and please run on linux pass 
0
source

All Articles