Ordering memory std :: atomic :: load

Is it wrong to assume that atomic :: load should also act as a memory barrier to ensure that all previous non-atomic records become visible by other threads?

To illustrate:

volatile bool arm1 = false; std::atomic_bool arm2 = false; bool triggered = false; 

Thread1:

 arm1 = true; //std::std::atomic_thread_fence(std::memory_order_seq_cst); // this would do the trick if (arm2.load()) triggered = true; 

Thread 2:

 arm2.store(true); if (arm1) triggered = true; 

I expected that after the execution of both the "running" would be true. Please do not suggest making arm1 atomic, you need to investigate the behavior of the atomic :: load.

Although I must admit that I do not fully understand the formal definitions of various relaxed semantics, the memory order I thought that the sequential ordering sequence was quite simple, because it ensures that "there is a single general order in which all threads follow all modifications in the same order" . For me, this means that std :: atomic :: load with the default memory order for std :: memory_order_seq_cst will also act as a memory pick. This is confirmed by the following statement in the “Consistent reconciliation” section:

Complete sequential sequencing requires a full CPU CPU command for all multi-core systems.

However, my simple example shows that this does not apply to MSVC 2013, gcc 4.9 (x86) and clang 3.5.1 (x86), where the atomic load is simply translated into a load statement.

 #include <atomic> std::atomic_long al; #ifdef _WIN32 __declspec(noinline) #else __attribute__((noinline)) #endif long load() { return al.load(std::memory_order_seq_cst); } int main(int argc, char* argv[]) { long r = load(); } 

With gcc, it looks like this:

 load(): mov rax, QWORD PTR al[rip] ; <--- plain load here, no fence or xchg ret main: call load() xor eax, eax ret 

I omit msvc and clang, which are essentially identical. Now on gcc for ARM we get what I expected:

 load(): dmb sy ; <---- data memory barrier here movw r3, #:lower16:.LANCHOR0 movt r3, #:upper16:.LANCHOR0 ldr r0, [r3] dmb sy ; <----- and here bx lr main: push {r3, lr} bl load() movs r0, #0 pop {r3, pc} 

This is not an academic question; it leads to a subtle race condition in our code, which casts doubt on my understanding of the behavior of std :: atomic.

+5
source share
2 answers

Sigh, it's too long for a comment:

Doesn't that mean that atomic "instantly appears in the rest of the system"?

I would say yes and no to this, depending on how you think about it. For recording with SEQ_CST yes. But how atomic loads are handled, check out 29.3 of the C ++ 11 standard. In particular, 29.3.3 really reads well, and 29.3.4 may be exactly what you are looking for:

For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced before B, then B observes either the last modification of memory_order_seq_cst M previous X in full order S or a later modification M in the order of its modification.

In principle, SEQ_CST forces the global order exactly as the standard says, but reading can return the old value without violating the "atomic" constraint.

In order to "get the absolute last value", you need to perform an operation that blocks the hardware negotiation protocol ( lock instruction on x86_64). This is what atomic comparisons and exchanges do if you look at the assembly.

+3
source

Is it wrong to assume that atomic :: load should also act as a memory barrier to ensure that all previous non-atomic records become visible by other threads?

Yes. atomic::load(SEQ_CST) simply states that reading cannot load the value "invalid" and neither writes nor loads can be reordered by the compiler or processor around this statement. This does not mean that you will always receive the latest value.

I would expect your code to have a data race, because again the barriers do not guarantee that the most recent value will be visible at a given point in time, they simply prevent reordering.

It is great for Thread1 to not see the thread2 entry and therefore not install triggered , and for Thread2 not see the Thread1 record (again, without installing triggered ), because you are only writing "atomically" from one thread.

With two threads that write and read common values, you need a barrier in each thread to ensure consistency. It sounds like you knew that this was already based on your comments on the code, so I’ll just leave it to the “C ++ standard, which is somewhat misleading when it comes to accurately describing the meaning of atomic / multithreaded operations”.

Despite the fact that you write C ++, it is still better, in my opinion, to think about what you are doing in the basic architecture.

Not sure I explained it well, but I would be happy to go into more detail if you want.

+2
source

Source: https://habr.com/ru/post/1214381/


All Articles