Threading in Python - What am I missing?

This is my first attempt at threads in Python ... And it worked with greed :) I wanted to implement the main problem with the critical zone and found that this code does not actually represent a problem.

Question: why I have no problems with incrementing the counter? Doesn't the counter have random values ​​after the run? I can only explain this if the increment is already running atomically or if the threads are not parallel ...

import threading import time turnstile_names = ["N", "E", "S", "W"] count = 0 class Counter(threading.Thread): def __init__(self, id): threading.Thread.__init__(self) self.id = id def run(self): global count for i in range(20): #self.sem.acquire() count = count + 1 #self.sem.release() def main(): sem = threading.Semaphore(1) counters = [Counter(name) for name in turnstile_names] for counter in counters: counter.start() # We're running! for counter in counters: counter.join() print count return 0 if __name__ == '__main__': main() 

Notes. I left comments on acquire() and release() to check the difference. I tried to skip the thread by adding a little sleep after the increment - no difference

Solution / tests: Thanks, Kevin (see accepted answer below). I just tested changing a loop variable and got the following:

 Loops Result 20 99% of the time 80. Sometimes 60. 200 99% of the time 800. Sometimes 600. 2000 Maybe 10% of the time different value 20000 Finally... random numbers! I've yet to see 80000 or 60000. All numbers are now random, as originally expected. 

I suspect that this apparently means that the flow overhead is in the order of 10 ^ 4 increment operations.

Another interesting test (well, at least in my opinion):

I added time.sleep(random.random()/divisor) after the increment and found when the number of loops is again 20:

 divisor result 100 always 4, so the race condition is always there. 1000 95% of the time 4, sometimes 3 or 5 (once 7) 10000 99% of the time NOT 4, varying from 4 to 13 100000 basically same as 10000 1000000 varying from 10 to 70 10000000... same as previous... (even with time.sleep(0)) 
+5
source share
2 answers

If you increase the number of iterations in the stream:

 def run(self): global count for i in range(100000): #self.sem.acquire() count = count + 1 #self.sem.release() 

Then the condition of race arises. Your script prints, for example. 175165, when 400,000 was expected. This suggests that the increment is not atomic.


Additional evidence that the increment is not atomic: the flow behavior in CPython is defined by Global Interpreter Lock . According to the wiki

locking the global interpreter, or GIL, is a mutec that prevents several native threads from executing Python bytecodes at once.

If the GIL has a granularity at the bytecode level, we expect that the increment will not be atomic, because more than one bytecode is required to execute, as demonstrated by the dis module:

 >>> import dis >>> def f(): ... x = 0 ... x = x + 1 ... >>> dis.dis(f) 2 0 LOAD_CONST 1 (0) 3 STORE_FAST 0 (x) 3 6 LOAD_FAST 0 (x) 9 LOAD_CONST 2 (1) 12 BINARY_ADD 13 STORE_FAST 0 (x) 16 LOAD_CONST 0 (None) 19 RETURN_VALUE 

Here, the increment act is performed by byte codes 6 through 13.


So why didn't the source code show the race conditions? This, apparently, is associated with the short life expectancy of each thread - looping only 20 times, each thread will complete its work and die before the next thread starts its own work.

+5
source

In Cpython, thread safety is determined by atomicity (one bytecode is not interrupted), GIL (python blocks one thread for about 100 ticks), and luck. Decompiling a simpler function,

 >>> import dis >>> count = 0 >>> def x(): ... count = count + 1 ... >>> dis.dis(x) 2 0 LOAD_FAST 0 (count) 3 LOAD_CONST 1 (1) 6 BINARY_ADD 7 STORE_FAST 0 (count) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE 

We see that the code may be interrupted between the download and the repository. This may mean that one thread is loading the value, pausing and eventually overwriting the larger value with its result.

Now luck comes into play. Performing the operation 20 times a bit. Allows you to change your code to read the counter as a parameter and see what happens with large values

 import threading import time import sys turnstile_names = ["N", "E", "S", "W"] count = 0 class Counter(threading.Thread): def __init__(self, id): threading.Thread.__init__(self) self.id = id def run(self): global count for i in range(int(sys.argv[1])): #self.sem.acquire() count = count + 1 #self.sem.release() def main(): sem = threading.Semaphore(1) counters = [Counter(name) for name in turnstile_names] for counter in counters: counter.start() # We're running! for counter in counters: counter.join() print count return 0 if __name__ == '__main__': main() 

One launched:

 td@timsworld2 :~/tmp/so$ python count.py 1 4 td@timsworld2 :~/tmp/so$ python count.py 2 8 td@timsworld2 :~/tmp/so$ python count.py 20 80 td@timsworld2 :~/tmp/so$ python count.py 200 749 td@timsworld2 :~/tmp/so$ python count.py 2000 4314 

Since 2000 in 4 threads I have lost almost half of the values.

+2
source

All Articles