C ++ 0X memory_order without fences, applications, chips that support

As a continuation of my previous question, the atomic<T> class indicates most of the operations with the memory_order parameter. Unlike a fence, this memory order only affects the atom on which it works. Presumably, using several of these atoms, you can build a parallel algorithm in which the ordering of another memory is inconsequential.

I have two questions:

  • Can someone point me to an example of an algorithm / situation that would benefit from ordering individual atomic variables and do not require fences?
  • What modern processors support this type of behavior? That is, when the compiler does not just translate a specific order into a regular fence.
+6
c ++ atomic c ++ 11 memory-fences
source share
1 answer

The memory ordering parameter in operations with std::atomic<T> variables does not affect the ordering of this operation as such, it affects the ordering relations that the operation creates with other operations.

eg. a.store(std::memory_order_release) by itself does not say anything about how operations in a ordered relative to anything else, but in combination with calling a.load(std::memory_order_acquire) from another stream, then order operations others - all entries in other variables (including non-atomic ones) performed by the thread that made the storage in a are visible to the thread that was loading if this load reads the stored value.

In modern processors, some orders of memorizing operations are not operations. for example, on x86, memory_order_acquire , memory_order_consume and memory_order_release implicit in load and store instructions and do not require separate fences. In these cases, orderings can only affect the command reordering the compiler.

Explanation. Implicit obstructions in instructions may mean that the compiler does not need to issue any explicit restrictions if all restrictions on the memory order are tied to separate operations with atomic variables. If you use memory_order_relaxed for everything and add explicit barriers, then the compiler may well explicitly issue these barriers as instructions.

eg. on x86, the XCHG command carries with it an implicit memory_order_seq_cst fence. Thus, there is no difference between the generated code for the two exchange operations below on x86 --- they both map to the same XCHG instruction:

 std::atomic<int> ai; ai.exchange(3,std::memory_order_relaxed); ai.exchange(3,std::memory_order_seq_cst); 

However, I do not yet know a single compiler that will get rid of explicit fence instructions in the following code:

 std::atomic_thread_fence(std::memory_order_seq_cst); ai.exchange(3,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); 

I expect that compilers will handle this optimization in the end, but there are other similar cases where implicit fences will improve optimization.

In addition, std::memory_order_consume can only be applied to direct operations with variables.

+4
source share

All Articles