Why is it necessary to get the barrier needed to delete data in a smart pointer with an atomic link?

Boost provides an atomic reference sample with a calculated generic pointer

Here is the corresponding code snippet and explanation of the various orders used:

class X { public: typedef boost::intrusive_ptr<X> pointer; X() : refcount_(0) {} private: mutable boost::atomic<int> refcount_; friend void intrusive_ptr_add_ref(const X * x) { x->refcount_.fetch_add(1, boost::memory_order_relaxed); } friend void intrusive_ptr_release(const X * x) { if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) { boost::atomic_thread_fence(boost::memory_order_acquire); delete x; } } }; 

Increasing the reference counter can always be done using memory_order_relaxed: new links to the object can be formed from an existing link and transferring the existing link from one thread to another should already provide any necessary synchronization.

It is important to provide any possible access to the object in one thread (through an existing link) in order to happen before the object is deleted in another thread. This is achieved by "releasing" after resetting the link (any access to the object through this link, obviously, should have happened earlier), and "acquire" before deleting the object.

It would be possible to use memory_order_acq_rel for fetch_sub but this leads to unnecessary โ€œacquireโ€ operations when the control counter has not yet reached level zero and may impose a fine on performance.

I cannot understand why the memory_order_acquire barrier memory_order_acquire needed before the delete x operation. In particular, how safe is the compiler / processor to reorder delete x memory operations to fetch_sub and test for x == 1 without violating single-threaded semantics?

EDIT I think my question was not very clear. Here is the paraphrased version:

Will the control dependency between reading x ( x->refcount_.fetch_sub(1, boost::memory_order_release) == 1 ) and the delete x operation provide any order guarantee? Even if we consider a single-threaded program, is it possible for the compiler / processor to reorder the instructions corresponding to delete x to fetch_sub and compare ?. It would be very helpful if the answer was as low as possible, and include an example scenario in which the delete operation will be reordered (without affecting the semantics of a single thread), which illustrates the need to maintain order.

+8
c ++ multithreading boost shared-memory atomic
source share
2 answers

Consider two streams, each of which contains one link to the object, which are the last two links:

 ------------------------------------------------------------ Thread 1 Thread 2 ------------------------------------------------------------ // play with x here fetch_sub(...) fetch_sub(...) // nothing delete x; 

You must make sure that any changes made to the object on stream 1 in //play with x here are visible to Thread 2 when it calls delete x; . To do this, you need to purchase a fence, which together with memory_order_release on fetch_sub() calls ensures that the changes made in stream 1 are visible.

+5
source share

From, http://en.cppreference.com/w/cpp/atomic/memory_order

memory_order_acquire - The load operation with this memory order performs the receive operation in the memory area that has been damaged: records made in other places in the memory by the thread that made the release become visible in this thread.

...

Release-Acquisition Procedure

If the atomic store in stream A is marked as std :: memory_order_release and the atomic load in stream B from the same variable is marked as std :: memory_order_acquire, all the memory entries (non-atomic and relaxed atomic) that occurred - before the atomic storage in terms of stream A , become visible side effects in stream B, that is, once the atomic load is completed, stream B is guaranteed to see all stream A is written to memory.

Synchronization is only established between threads that release and receive the same atomic variable. Other threads may see a different memory access order than one or both of the synchronized threads.

In highly ordered systems (x86, SPARC TSO, IBM mainframe) , sales orders are automatically issued for most operations. There are no additional CPU instructions for this synchronization mode, only certain compiler optimizations affect (for example, the compiler is not allowed to move non-nuclear storage past atomic storage or release non-nuclear loads before the atomic loads acquire). In poorly ordered systems (ARM, Itanium, PowerPC) , special instructions must be used to load the processor or memory.

This means that the release allows other threads to synchronize pending operations from the current thread, while the subsequent one gets all changed changes from other threads.

In highly ordered systems, this is not so important. I do not think that these instructions even generate code, since the processor automatically locks the cache lines before writes can occur. The cache is guaranteed to be consistent. But in weekly ordered systems, although atomic operations are well defined, there may be pending operations in other parts of the memory.

So, let flows A and B and both have some data D.

  • A gets some lock and that does things D
  • Disables lock
  • B releases the lock, finds 0 ref count and therefore decides to remove D
  • removes D
  • ... the data pending in # 1 is not yet visible, so bad things happen.

with a thread hook before deletion, the current thread synchronizes all pending operations from other threads in its address space. And when the deletion happens, he sees what A did at # 1.

+2
source share

All Articles