EDIT: Ben is right (and I'm an idiot to say that he wasn’t) that there is a possibility that the processor will re-order the instructions and execute them simultaneously across multiple pipelines. This means that value = 1 can be set before the pipeline does the "work". In my defense (not a complete idiot?) I never saw this happen in real life, and we have an extensive library of threads, and we perform comprehensive long-term tests, and this template is used everywhere. I would have seen it if this had happened, but none of our tests ever failed or did not cause the wrong answer. But ... Ben is right, there is such an opportunity. This is likely to happen in our code all the time, but reordering does not set flags early enough so that consumers of data protected by flags can use the data until it is complete. I will change our code to include barriers, because there is no guarantee that this will continue to work in the wild. I believe the correct solution looks like this:
Topics that read meaning:
... if (value) { __sync_synchronize();
The stream that sets the value:
... DoStuff() __sync_synchronize();
As I said, I found this to simply explain the barriers.
INTEGRATED BARRIER Memory items affect the processor. Compiler limits affect the compiler. Volatile will not force the compiler to reorder the code. Here for more information.
I believe that you can use this code so that gcc does not change the code at compile time:
#define COMPILER_BARRIER() __asm__ __volatile__ ("" ::: "memory")
So maybe this really needs to be done?
#define GENERAL_BARRIER() do { COMPILER_BARRIER(); __sync_synchronize(); } while(0)
Topics that read meaning:
... if (value) { GENERAL_BARRIER();
The stream that sets the value:
... DoStuff() GENERAL_BARRIER();
Using GENERAL_BARRIER () allows gcc to reorder the code, and also causes the processor to reorder the code. Now I wonder if gcc will reorder the code by its built-in memory barrier __sync_synchronize (), which would make it unnecessary to use COMPILER_BARRIER.
X86 As Ben points out, different architectures have different rules regarding how they reorder code in execution pipelines. Intel seems pretty conservative. Thus, Intel may require almost all of the barriers. Not a reason to prevent barriers, although this may change.
ORIGINAL MAIL: We do this all the time. its completely safe (not for all situations, but many). Our application runs on 1000 servers in a huge farm with 16 instances per server, and we do not have race conditions. You are rightly wondering why people use mutexes to protect atomic operations already. In many situations, a castle is a waste of time. Reading and writing 32-bit integers on most architectures is atomic. Do not try this with 32 bit bit fields!
Reordering a processor record will not affect one thread reading the global value specified by another thread. In fact, the result using locks is the same as the result without locks. If you win the race and check the value before changing it ... well, that’s the same as winning the race to fix the value so that no one can change it while you read it. Functionally the same.
The volatile keyword tells the compiler not to store the value in a register, but to continue to reference the original memory location. this should have no effect unless you optimize the code. We found that the compiler is quite smart in this matter and has not yet encountered a situation where something has volatilely changed. The compiler seems to be a good fit for registry optimization candidates. I suspect that the const keyword may encourage case optimization for a variable.
The compiler can reorder the code in a function if it knows that the end result will not differ. I have not seen the compiler do this with global variables, because the compiler has no idea how changing the order of a global variable will affect the code outside the immediate function.
If the function works, you can control the optimization level at the function level using __attrribute __.
Now, if you use this flag as a gateway to allow only one group thread to do some work , which does not work , Example: Thread A and Thread B can read the flag. Topic A gets scheduled. Thread B sets the flag to 1 and starts working. Thread A wakes up and sets the flag to 1 and starts working. Email Oh! To avoid locks and still do something similar, you need to learn atomic operations, in particular gcc atomic builtins , for example __sync_bool_compare_and_swap (value, old, new). This allows you to set value = new if the value is currently old. In the previous example, if value = 1, only one thread (A or B) could execute __sync_bool_compare_and_swap (& value, 1, 2) and change the value from 1 to 2. The losing thread will fail. __sync_bool_compare_and_swap returns the success of the operation.
In depth, when you use atomic built-in functions, there is a "lock", but this is a hardware instruction and is very fast compared to using mutexes.
However, use mutexes when you need to change many values at once. atomic operations (as of today) work only when all the data that must be changed atomically can fit into continuous 8,16,32,64 or 128 bits.