I am looking for a lock implementation that gracefully degrades in a situation where you have two threads that are constantly trying to release and re-acquire the same lock at a very high frequency.
Of course, it is clear that in this case the two streams will not move significantly in parallel. Theoretically, the best result would be achieved by starting the entire stream 1, and then the entire stream 2 without switching - because switching creates huge overhead here. So I'm looking for a lock implementation that will handle this situation gracefully, keeping the same thread for a while before switching, instead of constantly switching.
Long version of the question
As I myself was tempted to answer this question, "your program is broken, do not do it," here are a few excuses for why we find ourselves in this situation.
The lock is a "single global lock", i.e. very rough blocking. (This is the global interpreter lock (GIL) inside PyPy, but the question is how to do this in general, say if you have a C program.)
We have the following situation:
There is a constant debate. Expected in this case: a lock is a global lock that should be obtained for most threads. Therefore, we expect that most of them are waiting for blocking. Only one of these threads can progress.
In a thread that contains a lock, short short releases may sometimes appear. A typical example would be if this thread repeats calls to "something external," for example. many short entries to a file. Each of these entries is usually very fast. The lock should still be released just in case this external thing turns out to be longer than expected (for example, if the record really needs to wait for the disk I / O), so another thread can get the lock in this case.
If we use some standard mutex to lock, then the lock often switches to another thread as soon as the owner releases the lock. But the problem is that if the program starts several threads, each of which wants to make a long package of short releases. The program ends with most of the time switching the lock between the CPUs.
It is much faster to start the same thread for a while before switching, at least until the lock is released for very short periods of time. (For example, on Linux / pthread, the release immediately after which the acquisition occurs can sometimes re-capture the lock instantly, even if there are other pending threads, but we would like this result to be the vast majority of cases, and not just sometimes.)
Of course, once the lock is released for a longer period of time, it becomes a good idea to transfer ownership of the lock to another thread.
So, I am looking for general ideas on how to do this. I assume that it should already exist somewhere - in a document or in some kind of multi-threaded library?
For reference, PyPy is trying to implement something like this by polling: a lock is just a global variable, with synchronized change and replace, but without OS calls; one of the pending streams is given the role of "kidnapper"; that the "stealer" wakes up every 100 microseconds to check the variable. This is not terribly bad (it costs 1-2% of the processor time in addition to 100% of the consumed thread). This actually implements what I am asking here, but the problem is that it is a hack that does not support more traditional lock cases: for example, if thread 1 tries to send a message to thread 2 and wait for a response, two flow switches will take on average 100 microseconds each - this is too much if the message is processed quickly.