C ++ 11 register cache stream security

in volatile: best friend for several programmers , Andrei Alexandrescu gives this example:

class Gadget { public: void Wait() { while (!flag_) { Sleep(1000); // sleeps for 1000 milliseconds } } void Wakeup() { flag_ = true; } ... private: bool flag_; }; 

he claims,

... the compiler concludes that it can cache flag_ in the register ... it does harm to the correctness: after you call Wait on some Gadget, although another thread calls Wakeup, Wait will loop forever. This is because changing the flag_ will not be reflected in the register that caches the _ flag.

then he offers a solution:

If you use the volatile modifier for a variable, the compiler will not cache this variable in registers - each access will fall into the actual memory location of this variable.

now, other people mentioned in stackoverflow and elsewhere that the volatile keyword really does not offer any guarantees of thread safety, and instead I should use std :: atomic or mutex synchronization, which I agree with.

however, for example, comes the std :: atomic route, which internally uses read_acquire and write_release memory barriers ( Get and release semantics ) I don’t see how it actually fixes the cache-register problem in particular.

in the case of x86, for example, each download on x86 / 64 already implies semantics, and each store implies release semantics, so that compiled code for x86 does not create any real memory barriers at all. ( The purpose of memory_order_consume in C ++ 11 )

 g = Guard.load(memory_order_acquire); if (g != 0) p = Payload; 

enter image description here

On Intel x86-64, the Clang compiler generates compact machine code for this example - one machine instruction per line of C ++ source code. This processor family has a strong memory model, so the compiler does not need to allocate special instructions for protecting memory to implement read-receive.

so ... just assuming x86 arch at the moment, how does std :: atomic solve the cache problem in the registry? w / with no read-receive memory protection instructions in compiled code, this is like compiled read-only code.

+6
source share
2 answers

Did you notice that in your code there was no download only from the registry? Explicit memory loading from _Guard . Therefore, it actually prevented caching in the register.

Now, how does this happen with a specific platform implementation of std::atomic , but it should do it.

And by the way, the arguments of Alexandrescu are completely false for modern platforms. Although it is true that volatile prevents compiler caching in the register, it does not prevent similar caching by the processor or hardware. On some platforms, this may turn out to be adequate, but there is absolutely no reason to write a free portable code that might break on a future processor, compiler, library, or platform when a fully portable alternative is available.

+5
source

volatile not required for any β€œsensible” implementation when the Gadget example is modified to use std::atomic<bool> . The reason for this is not that the standard prohibits the use of registers instead (Β§29.3 / 13 in n3690):

Implementations should make nuclear storage visible to atomic loads within a reasonable amount of time.

Of course, what is "reasonable" is open to interpretation, and it should, rather than should, so the implementation can ignore the requirement without violating the letter of the standard. Typical implementations do not cache the results of atomic loads, as well as the (large) delay in the issuance of atomic storage in the CPU and, thus, leave the solution mostly to hardware. If you want to apply this behavior, you should use volatile std::atomic<bool> instead. In both cases, however, if the other thread sets the flag, Wait() should be finite, but if your compiler and / or processor is so willing, it may still take a lot more time than you would like.

Also note that memory sampling does not guarantee that the store will become visible to another thread immediately or sooner than otherwise. Therefore, even if the compiler added fence instructions to the Gadget methods, they would not help at all. Fences are used to ensure consistency, not to increase productivity.

+1
source

All Articles