Use non-atomic and atomic operations at the same time

I have a thread pool, each thread contains a counter (mostly TLS).

The main thread is required for frequent updates, calculating the sum of all local thread counters.

In most cases, each thread increments its own counter, so synchronization is not required.

But at a time when the main thread is being updated, I, of course, need some kind of synchronization.

I came up with MSVS ( _InterlockedXXX) built-in functions and it showed excellent performance (~ 0.8 s in my test) However, it limits my code to MSVC compilers and X86 / AMD64 platforms, but does it have a C ++ - portable way to do this?

  • I tried changing int type to std::atomic<int>for counter using std::memory_order_relaxedfor increment, but this solution is very slow! (~ 4s)

  • When using the base element std::atomic<T>::_My_val, access to it is non-atomic, as I would like, but it is not portable, so the problem is the same ...

  • Using the only one std::atomic<int>shared by all threads is even slower due to high competition (~ 10 s)

Do you have any ideas? Maybe I should use the (boost) library? Or write my own class?

+4
source share
3 answers

std::atomic<int>::fetch_add(1, std::memory_order_relaxed)performed as fast as _InterlockedIncrement.

Visual Studio lock add $1 ( ), - lock inc, ; (Core i5 @3,30 ) 5630 /, 18,5 .

Microbenchmark Benchpress:

#define BENCHPRESS_CONFIG_MAIN
#include "benchpress/benchpress.hpp"
#include <atomic>
#include <intrin.h>

std::atomic<long> counter;
void f1(std::atomic<long>& counter) { counter.fetch_add(1, std::memory_order_relaxed); }
void f2(std::atomic<long>& counter) { _InterlockedIncrement((long*)&counter); }
BENCHMARK("fetch_add_1", [](benchpress::context* ctx) {
    auto& c = counter; for (size_t i = 0; i < ctx->num_iterations(); ++i) { f1(c); }
})
BENCHMARK("intrin", [](benchpress::context* ctx) {
    auto& c = counter; for (size_t i = 0; i < ctx->num_iterations(); ++i) { f2(c); }
})

:

fetch_add_1                           200000000        5634 ps/op
intrin                                200000000        5637 ps/op
+2

​​, . semi_atomic<T>::Set()

#include <atomic>

template <class T>
class semi_atomic<T> {
    T Val;
    std::atomic<T> AtomicVal;
    semi_atomic<T>() : Val(0), AtomicVal(0) {}
    // Increment has no need for synchronization.
    inline T Increment() {
        return ++Val;
    }
    // Store the non-atomic Value atomically and return it.
    inline T Get() {
        AtomicVal.store(Val, std::memory_order::memory_order_release);
        return AtomicVal.load(std::memory_order::memory_order_relaxed);
    }
    // Load _Val into Val, but in an atomic way (?)
    inline void Set(T _Val) {
        _InterlockedExchange((volatile long*)&Val, _Val); // And with C++11 ??
    }
}

, - !

0

: std::atomic<int> , - .

() X86 AMD64.

, sInt 32- 64-.

// Here the magic
inline sInt MyInt::GetValue() {
    return *(volatile sInt*)&Value;
}

// Interlocked intrinsic is atomic
inline void MyInt::SetValue(sInt _Value) {
#ifdef _M_IX86
    _InterlockedExchange((volatile sInt *)&Value, _Value);
#else
    _InterlockedExchange64((volatile sInt *)&Value, _Value);
#endif
}

MSVS X86 ( GetValue())

0

All Articles