C ++ implementation of shared_mutex

boost::shared_mutex or std::shared_mutex (C ++ 17) can be used for one recorder, for multiple readers. As a training exercise, I put together a simple implementation that uses spinlock and has other limitations (for example, an honesty policy), but it is obviously not intended for use in real applications.

The idea is that the mutex keeps the reference count equal to zero if no thread holds the lock. If> 0, the value represents the number of readers who have access. If -1, one author has access.

Is this the correct implementation (in particular, with the minimum memory order used), in which there are no data races?

 #include <atomic> class my_shared_mutex { std::atomic<int> refcount{0}; public: void lock() // write lock { int val; do { val = 0; // Can only take a write lock when refcount == 0 } while (!refcount.compare_exchange_weak(val, -1, std::memory_order_acquire)); // can memory_order_relaxed be used if only a single thread takes write locks ? } void unlock() // write unlock { refcount.store(0, std::memory_order_release); } void lock_shared() // read lock { int val; do { do { val = refcount.load(std::memory_order_relaxed); } while (val == -1); // spinning until the write lock is released } while (!refcount.compare_exchange_weak(val, val+1, std::memory_order_acquire)); } void unlock_shared() // read unlock { // This must be a release operation (see answer) refcount.fetch_sub(1, std::memory_order_relaxed); } }; 
+11
c ++ multithreading mutex c ++ 11 stdatomic
source share
1 answer

(CAS = Compare And Swap = C ++ compare_exchange_weak function that usually compiles in x86 into the x86 lock cmpxchg , which can be executed only if it owns a cache line in Exclusive or Modified. MESI state).


lock_shared looks good: a read-only rotation with a CAS attempt only when it seems possible is better for performance than a CAS rotation or atomic increment. You should have already done a read-only check to avoid changing -1 to 0 and unlocking write locks.

On the x86 platform, put _mm_pause() in the repetition path of the rotation cycle to avoid the kernel errors of the pipeline of incorrect speculation of the memory order when exiting the read-only rotation cycle, and during the rotation, steal less resources from another hyper-thread. (Use a while() rather than do{}while() , so a pause only starts after a single failure. Pause on Skylake and waits around 100 loops later, so avoid this in a quick way.)


I think unlock_shared should use mo_release , not mo_relaxed , since it needs to order the downloads from the general data structure to make sure that the recorder does not start recording before loading from the critical section of the reader., ( Reordering in LoadStore is typical for loosely ordered architectures, although x86 only reorders in StoreLoad.) The Release operation orders previous downloads and holds them in the critical section .


(in lock record): // is it possible to use memory_order_relaxed if only one thread blocks the record?

No, you still need to store entries in the critical section, so CAS still needs to be synchronized with (in C ++ terminology) release stores from unlock_shared .

https://preshing.com/20120913/acquire-and-release-semantics/ has a good image that shows the one-way barrier effect of a release store or acquisition-download.

+6
source share

All Articles