in volatile: best friend for several programmers , Andrei Alexandrescu gives this example:
class Gadget { public: void Wait() { while (!flag_) { Sleep(1000);
he claims,
... the compiler concludes that it can cache flag_ in the register ... it does harm to the correctness: after you call Wait on some Gadget, although another thread calls Wakeup, Wait will loop forever. This is because changing the flag_ will not be reflected in the register that caches the _ flag.
then he offers a solution:
If you use the volatile modifier for a variable, the compiler will not cache this variable in registers - each access will fall into the actual memory location of this variable.
now, other people mentioned in stackoverflow and elsewhere that the volatile keyword really does not offer any guarantees of thread safety, and instead I should use std :: atomic or mutex synchronization, which I agree with.
however, for example, comes the std :: atomic route, which internally uses read_acquire and write_release memory barriers ( Get and release semantics ) I donβt see how it actually fixes the cache-register problem in particular.
in the case of x86, for example, each download on x86 / 64 already implies semantics, and each store implies release semantics, so that compiled code for x86 does not create any real memory barriers at all. ( The purpose of memory_order_consume in C ++ 11 )
g = Guard.load(memory_order_acquire); if (g != 0) p = Payload;

On Intel x86-64, the Clang compiler generates compact machine code for this example - one machine instruction per line of C ++ source code. This processor family has a strong memory model, so the compiler does not need to allocate special instructions for protecting memory to implement read-receive.
so ... just assuming x86 arch at the moment, how does std :: atomic solve the cache problem in the registry? w / with no read-receive memory protection instructions in compiled code, this is like compiled read-only code.