A recent discussion made me wonder how expensive an atomic increment is compared to a regular whole increment.
I wrote code to try and compare this:
#include <iostream> #include <atomic> #include <chrono> static const int NUM_TEST_RUNS = 100000; static const int ARRAY_SIZE = 500; void runBenchmark(std::atomic<int>& atomic_count, int* count_array, int array_size, bool do_atomic_increment){ for(int i = 0; i < array_size; ++i){ ++count_array[i]; } if(do_atomic_increment){ ++atomic_count; } } int main(int argc, char* argv[]){ int num_test_runs = NUM_TEST_RUNS; int array_size = ARRAY_SIZE; if(argc == 3){ num_test_runs = atoi(argv[1]); array_size = atoi(argv[2]); } if(num_test_runs == 0 || array_size == 0){ std::cout << "Usage: atomic_operation_overhead <num_test_runs> <num_integers_in_array>" << std::endl; return 1; } // Instantiate atomic counter std::atomic<int> atomic_count; // Allocate the integer buffer that will be updated every time int* count_array = new int[array_size]; // Track the time elapsed in case of incrmeenting with mutex locking auto start = std::chrono::steady_clock::now(); for(int i = 0; i < num_test_runs; ++i){ runBenchmark(atomic_count, count_array, array_size, true); } auto end = std::chrono::steady_clock::now(); // Calculate time elapsed for incrementing without mutex locking auto diff_with_lock = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start); std::cout << "Elapsed time with atomic increment for " << num_test_runs << " test runs: " << diff_with_lock.count() << " ns" << std::endl; // Track the time elapsed in case of incrementing without a mutex locking start = std::chrono::steady_clock::now(); for(unsigned int i = 0; i < num_test_runs; ++i){ runBenchmark(atomic_count, count_array, array_size, false); } end = std::chrono::steady_clock::now(); // Calculate time elapsed for incrementing without mutex locking auto diff_without_lock = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start); std::cout << "Elapsed time without atomic increment for " << num_test_runs << " test runs: " << diff_without_lock.count() << " ns" << std::endl; auto difference_running_times = diff_with_lock - diff_without_lock; auto proportion = difference_running_times.count() / (double)diff_without_lock.count(); std::cout << "How much slower was locking: " << proportion * 100.0 << " %" << std::endl; // We loop over all entries in the array and print their sum // We do this mainly to prevent the compiler from optimizing out // the loop where we increment all the values in the array int array_sum = 0; for(int i = 0; i < array_size; ++i){ array_sum += count_array[i]; } std::cout << "Array sum (just to prevent loop getting optimized out): " << array_sum << std::endl; delete [] count_array; return 0; }
The problem I am facing is that this program produces results with different discrepancies in each run:
balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 1000 500 Elapsed time with atomic increment for 1000 test runs: 99852 ns Elapsed time without atomic increment for 1000 test runs: 96396 ns How much slower was locking: 3.58521 % balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 1000 500 Elapsed time with atomic increment for 1000 test runs: 182769 ns Elapsed time without atomic increment for 1000 test runs: 138319 ns How much slower was locking: 32.1359 % balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 1000 500 Elapsed time with atomic increment for 1000 test runs: 98858 ns Elapsed time without atomic increment for 1000 test runs: 96404 ns How much slower was locking: 2.54554 % balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 1000 500 Elapsed time with atomic increment for 1000 test runs: 107848 ns Elapsed time without atomic increment for 1000 test runs: 105174 ns How much slower was locking: 2.54245 % balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 1000 500 Elapsed time with atomic increment for 1000 test runs: 113865 ns Elapsed time without atomic increment for 1000 test runs: 100559 ns How much slower was locking: 13.232 % balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 1000 500 Elapsed time with atomic increment for 1000 test runs: 98956 ns Elapsed time without atomic increment for 1000 test runs: 106639 ns How much slower was locking: -7.20468 %
This makes me think that there might be a mistake in the benchmarking code itself. Is there some kind of error that I am missing? Is using std :: chrono for benchmarking wrong? Or is it a time difference due to the overhead of signal processing in the OS related to atomic operations?
What can i do wrong?
Test bench:
Intel® Core™ i7-4700MQ CPU @ 2.40GHz × 8 8GB RAM GNU/Linux:Ubuntu LTS 14.04 (64 bit) GCC version: 4.8.4 Compilation: g++ -std=c++11 -O3 atomic_operation_overhead.cpp -o atomic_operation_overhead
EDIT: Updated test run output after compilation with -O3 optimization.
EDIT: after running the tests to increase the number of iterations and adding the loop amount to prevent optimization from the loop, as suggested by Adam, I get more converging results:
balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7111974931 ns Elapsed time without atomic increment for 99999999 test runs: 6938317779 ns How much slower was locking: 2.50287 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7424952991 ns Elapsed time without atomic increment for 99999999 test runs: 7262721866 ns How much slower was locking: 2.23375 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7172114343 ns Elapsed time without atomic increment for 99999999 test runs: 7030985219 ns How much slower was locking: 2.00725 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7094552104 ns Elapsed time without atomic increment for 99999999 test runs: 6971060941 ns How much slower was locking: 1.77148 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7099907902 ns Elapsed time without atomic increment for 99999999 test runs: 6970289856 ns How much slower was locking: 1.85958 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7763604675 ns Elapsed time without atomic increment for 99999999 test runs: 7229145316 ns How much slower was locking: 7.39312 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7164534212 ns Elapsed time without atomic increment for 99999999 test runs: 6994993609 ns How much slower was locking: 2.42374 % Array sum (just to prevent loop getting optimized out): 1215751192 balajeerc@Balajee :~/Projects/Misc$ ./atomic_operation_overhead 99999999 500 Elapsed time with atomic increment for 99999999 test runs: 7154697145 ns Elapsed time without atomic increment for 99999999 test runs: 6997030700 ns How much slower was locking: 2.25333 % Array sum (just to prevent loop getting optimized out): 1215751192