How to synchronize threads without using mutexes, semaphores, spinLock and futex?

This is an interview question, an interview has been done.

How to synchronize threads without using mutexes, semaphores, spinLock and futex?

Given 5 threads, how to make 4 of them wait for a signal from the left thread at the same point? this means that when all threads (1,2,3,4) are executed at the point of their flow function, they stop and wait for the signal from thread 5 sends a signal, otherwise they will not act.

My idea:

Use the global bool variable as a flag, if thread 5 does not set it to true, all other threads wait at some point, and also set the flag of the variable true. After thread 5 detects the flag variables of all threads, true, it will set the var true flag for it.

This is a lively expectation.

Any better ideas?

thanks

the pseudo code: bool globalflag = false; bool a[10] = {false} ; int main() { for (int i = 0 ; i < 10; i++) pthread_create( threadfunc, i ) ; while(1) { bool b = true; for (int i = 0 ; i < 10 ; i++) { b = a[i] & b ; } if (b) break; } } void threadfunc(i) { a[i] = true; while(!globalflag); } 
+7
source share
4 answers

Start with an empty linked list of pending threads. The head should be set to 0.

Use CAS, compare and swap to insert the thread at the top of the waiters list. If head = -1, then do not insert and do not wait. You can safely use CAS to insert items at the head of a linked list if you do it right.

After insertion, the waiting thread should wait on SIGUSR1. Use sigwait () for this.

When ready, the signaling flow uses CAS to set the waitlist header to -1. This prevents more threads from being added to the waiting list. The signal thread then iterates through the threads in the wait list and calls pthread_kill (& thread, SIGUSR1) to wake up each waiting thread.

If SIGUSR1 is sent before the sigwait call, sigwait will return immediately. Thus, there will be no race between adding a thread to the waitlist and calling sigwait.

EDIT:

Why is CAS faster than a mutex? Defendants (I'm lay). This is faster for some things in some situations, because it has less overhead when there is no race. Therefore, if you can reduce your parallel problem to the need to change 8-16-32-64-128 bits of continuous memory, and the race will not occur very often, CAS wins. CAS is basically a slightly more bizarre / expensive mov command, where you are going to do the usual “mov” anyway. Its an "exchng lock" or something like that.

A mutex, on the other hand, is a whole bunch of extra things that pollute other cache lines and use more memory barriers, etc. Although CAS acts as a memory barrier on x86, x64, etc. Then, of course, you should unlock the mutex, which probably amounts to about the same amount of extra material.

Here's how you add an item to a linked list using CAS:

 while (1) { pOldHead = pHead; <-- snapshot of the world. Start of the race. pItem->pNext = pHead; if (CAS(&pHead, pOldHead, pItem)) <-- end of the race if phead still is pOldHead break; // success } 

So, how often do you think your code will have multiple threads in this CAS line at the same time? Actually ... not very often. We conducted tests that simply looped, adding millions of elements with multiple threads at the same time, and this happens in less than 1% of cases. In a real program, this will never happen.

Obviously, if there is a race, you need to go back and do this cycle again, but in the case of a linked list, what does it cost you?

The downside is that you cannot do very complicated things on this linked list if you intend to use this method to add items to the head. Try implementing a double linked list. Such a pain.

EDIT:

In the above code, I am using the CAS macro. If you are using linux, CAS = macro uses __sync_bool_compare_and_swap. See gcc atomic builtins . If you use windows, CAS = macro uses something like InterlockedCompareExchange. Here's what a built-in function in windows might look like:

 inline bool CAS(volatile WORD* p, const WORD nOld, const WORD nNew) { return InterlockedCompareExchange16((short*)p, nNew, nOld) == nOld; } inline bool CAS(volatile DWORD* p, const DWORD nOld, const DWORD nNew) { return InterlockedCompareExchange((long*)p, nNew, nOld) == nOld; } inline bool CAS(volatile QWORD* p, const QWORD nOld, const QWORD nNew) { return InterlockedCompareExchange64((LONGLONG*)p, nNew, nOld) == nOld; } inline bool CAS(void*volatile* p, const void* pOld, const void* pNew) { return InterlockedCompareExchangePointer(p, (PVOID)pNew, (PVOID)pOld) == pOld; } 
+5
source
  • Select the signal to use, say, SIGUSR1.
  • Use pthread_sigmask to block SIGUSR1.
  • Create streams (they inherit the signal mask, so you need to do this first!)
  • Threads 1-4 call sigwait, block until receipt of SIGUSR1.
  • Topic 5 calls kill () or pthread_kill 4 times with SIGUSR1. Because POSIX indicates that the signals will be delivered to a stream that does not block the signal, it will be delivered to one of the streams waiting in sigwait (). Thus, there is no need to track which streams have already received the signal and which are not, with appropriate synchronization.
+1
source

You can do this using the SSE3 MONITOR and MWAIT , available through _mm_mwait and _mm_monitor intrinsics, Intel has an article about it here . (there is also a patent for the use of a memory-monitor - they are waiting for the blocking of competition here , which may be of interest).

+1
source

I think you are looking at the Peterson algorithm or the Dekker algorithm

They only synchronized threads based on shared memory.

+1
source

All Articles