Do atomic types need multithreading? (OS X, clang, C ++ 11)

Question

Do atomic types need multithreading? (OS X, clang, C ++ 11)

I am trying to show that it is a very bad idea not to use std::atomic<> , but I cannot create an example that reproduces a failure. I have two threads, and one of them:

 { foobar = false; }

and the other:

 { if (foobar) { // ... } }

type foobar is either bool or std::atomic_bool , and it is initialized to true . I use OS X Yosemite and even try to use this trick to hint by processor proximity that I want threads to run on different cores. I run such operations in a loop, etc., And in any case there is no noticeable difference in performance. I finish checking the generated assembly clang clang -std=c++11 -lstdc++ -O3 -S test.cpp , and I see that the differences in asm when reading are not significant (without an atom on the left, on the right):

No mfence or something that is “dramatic”. On the side of the recording, something more “dramatic” is happening:

As you can see, the version of atomic<> uses xchgb , which uses an implicit lock. When I compile with the relatively old version of gcc (v4.5.2), I can see all kinds of mfence , which also indicate a serious problem.

I understand that “X86 implements a very strong memory model” ( ref ) and that mfence might not be necessary, but does this mean that if I don’t want to write cross-platform code, for example it supports ARM, I really don’t need to put atomic<> if I don't care about ns level consistency?

I watched the "atomic <> Weapons" by Herb Sutter, but I'm still impressed with how difficult it is to create a simple example that reproduces these problems.

+5

c ++ gcc multithreading c ++ 11 clang

neverlastn 10 sept. '16 at 18:05

source share

3 answers

This is my own version of @Sebastian Redl's answer, which more closely matches this question. I will continue to recognize him for credit + kudos @HansPassant for his comment, which drew my attention to reports that made everything clear - because as soon as I noticed that the compiler was adding sync when writing, the problem was that it was to optimize the bool as much as you would expect.

I had a trivial program that reproduces the problem:

 std::atomic_bool foobar(true); //bool foobar = true; long long cnt = 0; long long loops = 400000000ll; void thread_1() { usleep(200000); foobar = false; } void thread_2() { while (loops--) { if (foobar) { ++cnt; } } std::cout << cnt << std::endl; }

The main difference from my source code was that I used usleep() inside a while . This was enough to prevent any optimizations in the while . The cleanup code above gives the same thing as for writing:

but completely different for reading:

We see that in the case of bool (left), clang brought if (foobar) out of the loop. Thus, when I run the bool case, I get:

 400000000 real 0m1.044s user 0m1.032s sys 0m0.005s

when I run the atomic_bool case, I get:

 95393578 real 0m0.420s user 0m0.414s sys 0m0.003s

Interestingly, the atomic_bool case is faster - I think, because it has only 95 million inc on the counter, opposite to 400 million in the bool case.

Even more crazy, this is interesting. If I move std::cout << cnt << std::endl; from the code stream, after pthread_join() , the loop in the non-atomic case becomes the following:

i.e. no loop. It is just if (foobar!=0) cnt = loops; ! Clever clank. Then execution gives:

 400000000 real 0m0.206s user 0m0.001s sys 0m0.002s

while atomic_bool remains unchanged.

Thus, there is more than enough evidence that we should use atomic s. The only thing to remember is not to put usleep() in your tests, because even if it is small, it will prevent quite a few compiler optimizations.

+1

neverlastn Sep 11 '16 at 16:23

source share

In general, very rarely, using atomic types really helps you in multi-threaded situations. It’s more useful to implement things like mutexes, semaphores, etc.

One of the reasons why this is not very useful: as soon as you have two values that need to be changed atomically, you are absolutely stuck. You cannot do this with atomic values. And quite rarely, I want to change one value atomically.

IOS and MacOS X use three methods: Protect changes with @synchronized. Preventing multithreaded access by running code in a sequential queue (may be the main queue). Using mutexes.

I hope you know that atomicity for boolean values is pretty pointless. You have a race condition: One thread keeps the value, the other reads it. Atomicity does not matter here. This makes (or can make) a difference if two threads accessing the variable at the same time cause problems. For example, if a variable increases by two threads at exactly the same time, is it guaranteed that the final result is increased by two? This requires atomicity (or one of the methods mentioned earlier).

Sebastian makes a ridiculous statement that atomicity captures the data race: this stupidity. In a data race, the reader reads the value before or after changing it, regardless of whether the value is atomic or not, it does not matter. The reader will read the old value or the new value, so the behavior is unpredictable. All this atomicity prevents the situation when the reader reads some kind of intermediate state. Which does not fix the data race.

-nine

gnasher729 10 sept. '16 at 18:13

source share

Sebastian redl · Accepted Answer · 2016-09-10T18:15:49+0000

The big problem with data investigations is that they are undefined behavior, not guaranteed misconduct. And this, combined with the general unpredictability of threads and the strength of the x64 memory model, means that it is very difficult to create reproducible crashes.

A slightly more reliable failure mode is when the optimizer does unexpected things, because you can watch them in the assembly. Of course, the optimizer, as you know, is also very sophisticated and can do something completely different if you change only one line of code.

Here is an example of the failure we had in our code at one point. The code implemented a kind of spin lock, but did not use atoms.

 bool operation_done; void thread1() { while (!operation_done) { sleep(); } // do something that depends on operation being done } void thread2() { // do the operation operation_done = true; }

This worked fine in debug mode, but the build build was stuck. Debugging showed that the execution of thread1 never left the loop, and looking at the assembly, we found that the condition was gone; the cycle was just endless.

The problem was that the optimizer realized that under its memory model operation_done could not change inside the loop (it was a data race), and thus, it “knew” that as soon as the condition was true once, this would be true forever.

Changing the operation_done type to atomic_bool (or, in fact, the equivalent equivalent to the pre-C ++ 11 compiler) fixed the problem.

Do atomic types need multithreading? (OS X, clang, C ++ 11)

More articles: