Understanding std :: atomic :: compare_exchange_weak () in C ++ 11

bool compare_exchange_weak (T& expected, T val, ..); 

compare_exchange_weak() is one of the exchange exchange primitives introduced in C ++ 11. It is weak in the sense that it returns false, even if the value of the object is expected . This is due to a false failure on some platforms, where a sequence of instructions is used to implement it (instead of just one on x86). On such platforms, a context switch, reloading the same address (or cache line) by another thread, etc. May cause failure of the primitive. This is spurious because it is not an object value (not equal to expected ) that does not perform the operation. Instead, it is a matter of time.

But what puzzles me is what is said in C ++ 11 Standard (ISO / IEC 14882),

29.6.5 .. The consequence of the false rejection is that almost all uses of weak compare-and-exchange will be in a loop.

Why should it be in a loop in almost all uses ? Does this mean that we will go in cycles when this happens due to false failures? If so, why are we trying to use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong() , which I think should get rid of false crashes for us. What are common use cases for compare_exchange_weak() ?

Another question. In his book C ++ Concurrency In Action, Anthony says

 //Because compare_exchange_weak() can fail spuriously, it must typically //be used in a loop: bool expected=false; extern atomic<bool> b; // set somewhere else while(!b.compare_exchange_weak(expected,true) && !expected); //In this case, you keep looping as long as expected is still false, //indicating that the compare_exchange_weak() call failed spuriously. 

Why is the !expected in the loop condition? Is there any way to prevent all threads from starving and not progressing for some time?

Edit: (last question)

On platforms where there is no single CAS hardware instruction, both the weak and strong versions are implemented using LL / SC (for example, ARM, PowerPC, etc.). So is there a difference between the next two cycles? Why, if any? (For me, they should have similar performance.)

 // use LL/SC (or CAS on x86) and ignore/loop on spurious failures while (!compare_exchange_weak(..)) { .. } // use LL/SC (or CAS on x86) and ignore/loop on spurious failures while (!compare_exchange_strong(..)) { .. } 

I come to this last question, you guys all mention that there may be a difference in performance in the loop. It is also mentioned in the C ++ 11 standard (ISO / IEC 14882):

When comparisons and exchanges are in a loop, a weak version will give better performance on some platforms.

But, as was analyzed above, two versions in a loop should give the same / similar performance. What am I missing?

+71
c ++ multithreading atomic c ++ 11
Aug 08 '14 at 9:11
source share
4 answers

I am trying to answer this myself by going through various online resources (for example, this and this ), C ++ 11 Standard, as well as the answers given here.

Related questions are combined (for example, “ why! Expected?Is combined with “why put compare_exchange_weak () in a loop? ”) And answers are given accordingly.




Why should compare_exchange_weak () have to loop in almost all cases?

Typical Pattern A

You need to get an atomic update based on the value in the atomic variable. The error indicates that the variable is not updated with our desired value, and we want to repeat it. Please note that we don't care if this fails due to concurrent recording or false rejection. But we don’t care that we are making this change.

 expected = current.load(); do desired = function(expected); while (!current.compare_exchange_weak(expected, desired)); 

A real example is multiple threads for adding an item to a singly linked list at the same time. Each thread first loads a pointer to the head, selects a new node and adds a head to this new node. Finally, he tries to replace the new node with his head.

Another example is the implementation of a mutex using std::atomic<bool> . No more than one thread can enter a critical section at a time, depending on which thread first set current to true and exits the loop.

Typical Pattern B

This is actually the pattern mentioned in Anthony's book. Unlike template A, you want the atomic variable to be updated once, but you don't care who does it. Until it updates, you try again. This is usually used with boolean variables. For example, you need to implement a trigger for a state machine to move. Which thread triggers the trigger.

 expected = false; // !expected: if expected is set to true by another thread, it done! // Otherwise, it fails spuriously and we should try again. while (!current.compare_exchange_weak(expected, true) && !expected); 

Please note that we generally cannot use this template to implement a mutex. Otherwise, several threads may be inside the critical section at the same time.

However, rarely use compare_exchange_weak() outside the loop. On the contrary, there are times when a strong version is used. For example.

 bool criticalSection_tryEnter(lock) { bool flag = false; return lock.compare_exchange_strong(flag, true); } 

compare_exchange_weak is not suitable here, because when it returns due to a false failure, it is likely that no one is still occupying a critical section.

The hungry stream?

Mention should be made of what happens if collateral failures continue to occur, thus starving the thread? Theoretically, this can happen on platforms when compare_exchange_XXX() is implemented as a sequence of instructions (e.g. LL / SC). Frequent access to the same cache line between LL and SC will result in continuous false failures. A more realistic example is due to silent planning, when all parallel threads alternate as follows.

 Time | thread 1 (LL) | thread 2 (LL) | thread 1 (compare, SC), fails spuriously due to thread 2 LL | thread 1 (LL) | thread 2 (compare, SC), fails spuriously due to thread 1 LL | thread 2 (LL) v .. 

May happen?

This will not happen forever, fortunately thanks to what C ++ 11 requires:

Implementations must ensure that weak exchange and exchange operations do not always return false, unless the atomic object has a value different from the expected one, or parallel modifications of the atomic object exist.

Why do we use compare_exchange_weak () and write the loop ourselves? We can just use compare_exchange_strong ().

It depends.

Case 1: when both should be used inside a loop. C ++ 11 says:

When comparisons and exchanges are in a loop, a weak version will give better performance on some platforms.

On x86 (at least for the time being. Maybe one day when using more cores, a similar LL / SC scheme will be used), the weak and strong version are essentially the same, since they both come down to the same cmpxchg instruction . On some other platforms where compare_exchange_XXX() not implemented atomically (there is no single hardware primitive), the weak version inside the loop can win the battle because the strong version will have to handle false failures and repeat accordingly.

But,

rarely, we may prefer compare_exchange_strong() over compare_exchange_weak() even in a loop. For example, when there are many things that need to be done between an atomic variable, a new value is loaded and calculated (see function() above). If the atom variable itself does not change often, we do not need to repeat the costly calculation for every false failure. Instead, we can hope that compare_exchange_strong() “absorb” such failures, and we simply repeat the calculation when it fails due to a change in the real value.

Case 2: when in a loop you need to use only compare_exchange_weak() . C ++ 11 also says:

If a loop is required for weak comparison and exchange and there is no strong loop, strong is preferable.

This usually happens when you run a loop to eliminate false crashes from the weak version. You repeat until the exchange is successful or unsuccessful due to simultaneous recording.

 expected = false; // !expected: if it fails spuriously, we should try again. while (!current.compare_exchange_weak(expected, true) && !expected); 

At best, he invents the wheels and performs the same actions as compare_exchange_strong() . Worse? This approach does not fully utilize the capabilities of machines that provide unrelated comparison and exchange of equipment.

Finally, if you are fixated on other things (for example, see “Typical Template A” above), then there is a good chance that compare_exchange_strong() should also be compare_exchange_strong() , which brings us back to the previous case.

+13
Aug 09 '14 at
source share

Why make an exchange in a loop?

Usually you want your work to be done before you go, so you put compare_exchange_weak in a loop so that it compare_exchange_weak to exchange until it compare_exchange_weak (i.e. returns true ).

Note that also compare_exchange_strong often used in a loop. It does not interrupt due to a false failure, but it does not work due to simultaneous recording.

Why use weak instead of strong ?

Pretty easy: a false glitch doesn't happen often, so it's not a big success. In principle, the transfer of such a failure allows a much more efficient implementation of the weak version (compared to strong ) on some platforms: strong should always check for a false failure and mask it. It is expensive.

Thus, weak used because it is much faster than strong on some platforms.

When should you use weak and when is strong ?

reference indicates when to use weak and when to use strong :

When comparisons and exchanges are in a loop, a weak version will give better performance on some platforms. When a weak exchange and exchange would require a loop, and a strong one would not, a strong one is preferable.

So, the answer seems pretty simple: if you need to enter a loop just because of a false failure, don't do this; use strong . If you have a loop anyway, use weak .

Why !expected in the example

It depends on the situation and its desired semantics, but usually it is not needed for correctness. Omission would give very similar semantics. Only if another thread can reset to false , the semantics may be slightly different (but I cannot find a meaningful example where you would like this). See Commentary by Tony D. for a detailed explanation.

This is just a quick track when another thread writes true : Then we interrupt work instead of writing true again.

About your last question

But, as was analyzed above, two versions in a loop should give the same / similar performance. What am I missing?

From Wikipedia :

Real LL / SC implementations are not always successful if there is no simultaneous updating of the corresponding memory location. Any exceptional events between two operations, such as a context switch, another load-link, or even (on many platforms) other loading or storage, will cause the storage state to falsely fail. Older implementations will fail if there are any updates broadcast from memory.

Thus, LL / SC will erroneously work with a context switch, for example. Now, the strong version will bring its “own little cycle” to detect a false failure and disguise it by trying again. Note that this native loop is also more complex than a regular CAS loop, since it must distinguish between a false failure (and its mask) and a failure due to concurrent access (which results in a return with a false value). The weak version does not have such a native loop.

Since you provide an explicit loop in both examples, there is simply no need to have a small loop for a strong version. Therefore, in the strong version example, a failure check is performed twice; once on compare_exchange_strong (which is more complicated since it must distinguish between false rejection and simultaneous acces) and once by your loop. This expensive check is not needed, and the reason weak will be faster here.

Also note that your argument (LL / SC) is just one opportunity to implement this. There are also platforms that even have different sets of instructions. Also (and more importantly), note that std::atomic should support all operations for all possible data types, so even if you declare a ten million byte structure, you can use compare_exchange to do this. Even if on a CPU that has CAS, you cannot use CAS ten million bytes, so the compiler will generate other instructions (possibly block the acquisition, and then non-atomic comparison and swap, followed by the release of the block). Now think about how much can happen when replacing ten million bytes. Therefore, although a false error can be very rare for 8-byte exchanges, in this case it can be more common.

So, in a nutshell, C ++ gives you two semantics, “best” ( weak ) and “I will do it for sure, no matter how many bad things can happen between them” ( strong ). How it is implemented on different types of data and platforms is a completely different topic. Do not associate your mental model with an implementation on your specific platform; The standard library is designed to work with more architectures than you might know. The only general conclusion we can make is that it is usually more difficult to guarantee success (and therefore may require additional work) than just trying and leaving the room for a possible failure.

+60
Aug 08 '14 at 9:21
source share

Why should it be in a loop in almost all uses ?

Because, if you do not execute the loop, and it does not work, your program did nothing useful - you did not update the atomic object, and you do not know what its current value is (Correction: see the comment below from Cameron). If the call does nothing useful, what is the point?

Does this mean that we will go in cycles when it fails due to false failures?

Yes.

If so, why are we trying to use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong (), which, it seems to me, should get rid of false crashes for us. What are the common uses of compare_exchange_weak ()?

On some architectures, compare_exchange_weak more efficient, and false failures should be quite unusual, so more efficient algorithms could be written using a weak form and a loop.

In general, it is probably best to use a strong version instead if your algorithm does not need to go in cycles, since you do not need to worry about side errors. If he still needs to focus on even the strong version (and many algorithms still need to focus on the loop), then using weak forms may be more efficient on some platforms.

Why !expected exists in a loop condition?

The value could be set to true another thread, so you do not want the loop to do it while trying to set it.

Edit:

But, as was analyzed above, two versions in a loop should give the same / similar performance. What am I missing?

Of course, it is obvious that on platforms where a false failure is possible, the implementation of compare_exchange_strong needs to be more complex in order to check for a false failure and try again.

The weak form simply returns when it fails, it does not repeat.

+15
Aug 08 '14 at 9:22
source share

Ok, so I need a function that performs atomic displacement on the left. For this, the processor does not have its own operation, and the standard library does not have a function for it, so it looks like I'm writing my own. Here:

 void atomicLeftShift(std::atomic<int>* var, int shiftBy) { do { int oldVal = std::atomic_load(var); int newVal = oldVal << shiftBy; } while(!std::compare_exchange_weak(oldVal, newVal)); } 

Now there are two reasons why a cycle can run more than once.

  • Someone changed the variable while I was doing the left shift. The results of my calculations should not be applied to an atomic variable, because it effectively erases what someone writes.
  • My CPU came off and the weak CAS frantically failed.

Honestly, I don't care which one. Moving to the left fast enough so that I can just repeat it, even if the error was false.

The less fast, however, is additional code that a strong CAS must wrap around a weak CAS in order to be strong. This code does not do much when a weak CAS succeeds ... but when it fails, a strong CAS must do some kind of detective work to determine if it was case 1 or case 2. This detective work takes the form of a second cycle, effectively inside my own cycle. Two nested loops. Imagine your algorithm teacher yelling at you right now.

And, as I mentioned, I do not care about the result of this detective work! In any case, I'm going to remake CAS. Thus, using a strong CAS does not change me at all and loses a small, but measurable amount of efficiency for me.

In other words, weak CAS is used to implement atomic update operations. Strong CAS is used when you care about the result of CAS.

+10
Aug 08 '14 at 15:53
source share



All Articles