Release multiple locks that do not cause priority inversion

Short version: How can I release multiple locks from a single thread without being unloaded halfway?

I have a program that is designed to work on an N-core computer. It consists of one main thread and N workflows. Each thread (including the main thread) has a semaphore on which it can block. Typically, each worker thread is blocked when its semaphore decreases, and the main thread is running. From time to time, the main thread has to wake up the worker threads in order to do its job for a certain time, and then block it on its own semaphore, expecting them to all go back to sleep. For instance:

def main_thread(n): for i = 1 to n: worker_semaphore[i] = semaphore(0) spawn_thread(worker_thread, i) main_semaphore = semaphore(0) while True: ...do some work... workers_to_wake = foo() for i in workers_to_wake: worker_semaphore[i].increment() # wake up worker n for i in workers_to_wake: main_semaphore.decrement() # wait for all workers def worker_thread(i): while True: worker_semaphore(i).decrement() # wait to be woken ...do some work... main_semaphore.increment() # report done with step 

All is well and good. The problem is that one of the awakened workers can eventually push the main thread halfway through the awakening of the workers: this can happen, for example, when the Windows Scheduler decides to increase the priority of the worker. This does not lead to a deadlock, but it is inefficient, because the remaining part of the threads remains sleeping until the crowding out worker completes his work. This is basically a priority inversion, with one of the workers waiting for the main thread, and some worker threads waiting for the main thread.

Perhaps I can find hackers with OS and schedulers for this, such as disabling priority upgrades under Windows and messing around with thread priorities and processor similarities, but I would like something cross-platform and reliable and clean. So: How can I wake a bunch of threads atomically?

+6
source share
4 answers

Peter Brittain’s decision, plus Anton’s suggestion of “tree awakening,” led me to another decision: Chained wakeups. Basically, and not the main thread that performs all the awakenings, it only wakes up one thread; and then each thread is responsible for awakening the next. The elegant bit here is that only one suspended thread is ready to run, so threads rarely end in switching cores. In fact, this works great with strict processor affinities, even if one of the worker threads has an affinity for the main thread.

Another thing I have done is to use an atomic counter so that work flows decrease before sleep; thus, only the last one spills the main thread, so there is also no chance that the main thread wakes up several times to wait for more semaphore.

 workers_to_wake = [] main_semaphore = semaphore(0) num_woken_workers = atomic_integer() def main_thread(n): for i = 1 to n: worker_semaphore[i] = semaphore(0) spawn_thread(worker_thread, i) main_semaphore = semaphore(0) while True: ...do some work... workers_to_wake = foo() num_woken_workers.atomic_set(len(workers_to_wake)) # set completion countdown one_to_wake = workers_to_wake.pop() worker_semaphore[one_to_wake].increment() # wake the first worker main_semaphore.decrement() # wait for all workers def worker_thread(i): while True: worker_semaphore[i].decrement() # wait to be woken if workers_to_wake.len() > 0: # more pending wakeups one_to_wake = workers_to_wake.pop() worker_semaphore[one_to_wake].increment() # wake the next worker ...do some work... if num_woken_workers.atomic_decrement() == 0: # see whether we're the last one main_semaphore.increment() # report all done with step 
0
source

TL DR

If you really need to get as much as possible from your employees, just use the event semaphore, control unit and barrier instead of your semaphores. Please note, however, that this is a more fragile solution, so you need to balance any potential benefits from this drawback.

Context

First I need to summarize the wider context in our discussion ...

You have a graphical Windows application. It has the desired frame rate, so you need the main thread to execute at such a speed, scheduling all your employees exactly at a time interval, so that they complete their work within the update interval. This means that you have very tight restrictions on startup and execution times for each thread. In addition, your workflows are not all the same, so you cannot just use one work queue.

Problem

Like any modern operating system, Windows has many synchronization primitives . However, none of them directly provides a mechanism for notifying several primitives at once. Looking through other operating systems, I see a similar model; they all provide wait methods for several primitives, but none of them provide an atomic way to run .

So what can we do instead? Problems you need to solve:

  • Precise timing for all required workers.
  • Pushing workers who really need to run in the next frame.

Functions

The most obvious solution to problem 1 is to use only one event semaphore, but you can also use read / write locks (by acquiring write locks after workers have finished work and forcing workers to use read locks), all other parameters are no longer atomic, therefore, additional synchronization is required to force the threads to do what you want - for example, the loss proposal for locks within semaphores.

But we need an optimal solution that minimizes context switches due to limited time constraints for your application, so let's see if any of them can be used to solve problem 2 ... How can you choose which workflows should be started from main, if we only have an event semaphore or a read / write lock?

Well ... Read / write locks are a great way for one thread to write some important data to the control unit and for many others to read from it. Why not just have a simple array of boolean flags (one for each worker thread) so that your main thread updates every frame? Unfortunately, you still need to stop running workers until a timer appears. In short, we again return to solving semaphore and blocking.

However, due to the nature of your application, you can take another step. You can rely on the fact that you know that your employees do not work outside of your time and use the event semaphore as a crude form of blocking.

The final optimization (if your environment supports them) is to use a barrier instead of the main semaphore. You know that all n threads must be idle before you can continue, so just insist on it.

Decision

Applying this above, your pseudo-code will look something like this:

 def main_thread(n): main_event = event() for i = 1 to n: worker_scheduled[i] = False spawn_thread(worker_thread, i) main_barrier = barrier(n+1) while True: ...do some work... workers_to_wake = foo() for i in workers_to_wake: worker_scheduled[i] = True main_event.set() main_barrier.enter() # wait for all workers main_event.reset() def worker_thread(i): while True: main_event.wait() if worker_scheduled[i]: worker_scheduled[i] = False ...do some work... main_barrier.enter() # report finished for this frame. main_event.reset() # to catch the case that a worker is scheduled before the main thread 

Since there is no explicit protection for the worker_scheduled array, this is a much more fragile solution.

Therefore, I would personally use it only if I had to squeeze every last ounce of processing from my processor, but it seems that this is exactly what you are looking for.

+3
source

This is not possible if you use multiple synchronization objects (semaphores) when the complexity of the wake-up algorithm is O (n). There are several ways to solve the problem.

let go all at once

I'm not sure if Python has the required method (is your question specific to Python?), But semaphores usually have operations with an argument specifying the number of decrements / increments. That way, you just put all your threads in the same semaphore and wake them all together. A similar approach is to use a conditional variable and notify everyone .

event loops

If you still want to be able to control each thread individually, but, like the one-to-many notification method, try libraries for asynchronous I / O, such as libuv (and his colleague Python ). Here you can create one event that wakes up all threads at once, and also creates an individual event for each thread, and then just wait on both (or more) event objects in the event loops in each thread. Another pevents library that implements WaitForMultipleObjects on top of pthreads conditional variables.

delegate waking up

Another approach is to replace your O (n) algorithm with a tree-like algorithm (O (log n)), where each thread wakes up only a fixed number of other threads, but delegates them to wake up the others. In the marginal case, the main stream can only wake up one other stream, which will awaken everyone else or start a chain reaction. This can be useful if you want to reduce latency for the main thread by waking up other threads.

+1
source

Read / Write Lock

The solution that I usually use on POSIX systems for one-to-many relationships is a read / write lock. It came as a surprise to me that they are not completely universal, but most languages ​​either implement the version, or at least have an affordable package for their implementation on any primitives, for example, python prwlock :

 from prwlock import RWLock def main_thread(n): for i = 1 to n: worker_semaphore[i] = semaphore(0) spawn_thread(worker_thread, i) main_lock = RWLock() while True: main_lock.acquire_write() ...do some work... workers_to_wake = foo() # The above acquire could be moved as low as here, # depending on how independent the above processing is.. for i in workers_to_wake: worker_semaphore[i].increment() # wake up worker n main_lock.release() def worker_thread(i): while True: worker_semaphore(i).decrement() # wait to be woken main_lock.acquire_read() ...do some work... main_lock.release() # report done with step 

Barriers

Barriers seem to be Python's closest alleged built-in mechanism to delay all threads until they are all warned, but:

  • This is a rather unusual solution, so they will make your code / more difficult to translate into other languages.

  • I would not want to use them for this case, when the number of threads that need to wake up continues to change. Given that your n sounds little, I’m tempted to use the Barrier(n) constant and notify all threads to check if this loop works. But:

  • I would be concerned that using a barrier would have unpleasant consequences, since any of the flows held by something external would contain all of them, and even a scheduler with increased resource dependency might miss this connection. The need for all n to reach the barrier could only worsen the situation.

+1
source

All Articles