Why do we need locks for threads if we have a GIL?

Question

Why do we need locks for threads if we have a GIL?

I think this is a stupid question, but I still can't find it. Actually it is better to divide it into two questions:

1) Can I have many threads correctly, but because of the GIL, only one thread is executed at a time?

2) If so, why do we need locks? We use locks to avoid the case when two threads try to read / write some common object, because the GIL twi threads cannot be executed in one moment, can they?

+8

python multithreading

Paul Oct 16 '16 at 16:54

source share

3 answers

At any moment, yes, only one thread executes Python code (other threads can execute some IO, NumPy, whatever). This is mostly true. However, this is trivially true for any single-processor system, and yet people still need locks on single-processor systems.

Take a look at the following code:

 queue = [] def do_work(): while queue: item = queue.pop(0) process(item)

With one thread, everything is in order. With two threads, you can get an exception from queue.pop() , because the second thread called queue.pop() in the last element is the first. Therefore, you will need to deal with it somehow. Using locks is a simple solution. You can also use a suitable parallel queue (for example, in the queue module), but if you look inside the queue module, you will find that the queue object has threading.Lock() inside it. Thus, in any case, you use locks.

A common mistake for a beginner is to write multi-threaded code without the necessary locks. You look at the code and think, “it will work fine,” and then find out after a few hours that something really strange happened because the streams did not synchronize properly.

Or, in short, there are many places in a multi-threaded program where you need to prevent another thread from changing the structure until you make some changes. This allows you to maintain invariants in your data, and if you cannot support invariants, then in principle it is impossible to write code correctly.

Or enter the shortest path: "You don't need locks if you don't care if your code is correct."

+3

Dietrich epp Oct 16 '16 at 17:08

source share

GIL prevents multiple threads from executing simultaneously, but not in all situations.

GIL is temporarily freed during I / O operations performed by threads. This means that multiple threads can work simultaneously. This is one of the reasons you still need locks.

I don’t know where I found this link .... in the video or something else - it's hard to figure it out, but you can further explore yourself

+2

vlad-ardelean Oct 16 '16 at 16:58

source share

zvone · Accepted Answer · 2016-10-16T17:07:55+0000

GIL protects Python boarding schools. It means:

You don’t need to worry about what is happening in the interpreter due to multithreading
most things do not work in parallel because python code is executed sequentially due to GIL

But GIL does not protect your own code. For example, if you have this code:

self.some_number += 1

This will read the value of self.some_number , compute some_number+1 , and then write it back to self.some_number .

If you do this in two threads, the operations (read, add, write) of one thread and the other can be mixed so that the result is incorrect.

This may be the order of execution:

thread1 reads self.some_number (0)
thread2 reads self.some_number (0)
thread1 computes some_number+1 (1)
thread2 computes some_number+1 (1)
thread1 writes 1 to self.some_number
thread2 writes 1 to self.some_number

You use locks to enforce this order of execution:

thread1 reads self.some_number (0)
thread1 computes some_number+1 (1)
thread1 writes 1 to self.some_number
thread2 reads self.some_number (1)
thread2 computes some_number+1 (2)
thread2 writes 2 to self.some_number

EDIT: Let me end this answer with some code that shows the explained behavior:

 import threading import time total = 0 lock = threading.Lock() def increment_n_times(n): global total for i in range(n): total += 1 def safe_increment_n_times(n): global total for i in range(n): lock.acquire() total += 1 lock.release() def increment_in_x_threads(x, func, n): threads = [threading.Thread(target=func, args=(n,)) for i in range(x)] global total total = 0 begin = time.time() for thread in threads: thread.start() for thread in threads: thread.join() print('finished in {}s.\ntotal: {}\nexpected: {}\ndifference: {} ({} %)' .format(time.time()-begin, total, n*x, n*x-total, 100-total/n/x*100))

There are two functions that implement the increment. One uses locks, and the other does not.

The increment_in_x_threads function implements the parallel execution of an incremental function in many threads.

Now doing this with lots of threads makes it almost certain that an error will occur:

 print('unsafe:') increment_in_x_threads(70, increment_n_times, 100000) print('\nwith locks:') increment_in_x_threads(70, safe_increment_n_times, 100000)

In my case, it is printed:

 unsafe: finished in 0.9840562343597412s. total: 4654584 expected: 7000000 difference: 2345416 (33.505942857142855 %) with locks: finished in 20.564176082611084s. total: 7000000 expected: 7000000 difference: 0 (0.0 %)

Thus, without blocking, there were many errors (33% of the increments failed). On the other hand, with locks he was 20 times slower.

Of course, both numbers are blown up because I used 70 threads, but this shows the general idea.

Why do we need locks for threads if we have a GIL?

EDIT: Let me end this answer with some code that shows the explained behavior:

More articles: