Why is std :: equal much slower than a manual loop for two small std :: array?

Question

Why is std :: equal much slower than a manual loop for two small std :: array?

I have profiled a small piece of code that is part of a larger simulation, and, to my surprise, the STL function equal to (std :: equal) is much slower than a simple for-loop comparing two elements of an array per element. I wrote a small test case, which, in my opinion, is a good comparison between the two, and the difference using g ++ 6.1.1 from the Debian archives is not insignificant. I am comparing two four-element arrays with a sign of integers. I tested std :: equal, operator == and a little loop. I did not use std :: chrono for the exact time, but the difference can be seen explicitly with time. /a.out.

My question, given the code example below, why the == operator and the overloaded std :: equal function (which calls the == operator, I believe) take about 40 seconds to complete, and does the manual write cycle take only 8 seconds? I am using the latest Intel laptop. For loops, for-loop runs faster at all optimization levels, -O1, -O2, -O3 and -Ofast. I compiled the code using g++ -std=c++14 -Ofast -march=native -mtune=native

Run code

The cycle works so many times, just to make the difference understandable to the naked eye. The modulo operators are a cheap operation on one of the elements of the array and serve to ensure that the compiler cannot optimize from the loop.

 #include<iostream> #include<algorithm> #include<array> using namespace std; using T = array<int32_t, 4>; bool are_equal_manual(const T& L, const T& R) noexcept { bool test{ true }; for(uint32_t i{0}; i < 4; ++i) { test = test && (L[i] == R[i]); } return test; } bool are_equal_alg(const T& L, const T& R) noexcept { bool test{ equal(cbegin(L),cend(L),cbegin(R)) }; return test; } int main(int argc, char** argv) { T left{ {0,1,2,3} }; T right{ {0,1,2,3} }; cout << boolalpha << are_equal_manual(left,right) << endl; cout << boolalpha << are_equal_alg(left,right) << endl; cout << boolalpha << (left == right) << endl; bool t{}; const size_t N{ 5000000000 }; for(size_t i{}; i < N; ++i) { //t = left == right; // SLOW //t = are_equal_manual(left,right); // FAST t = are_equal_alg(left,right); // SLOW left[0] = i % 10; right[2] = i % 8; } cout<< boolalpha << t << endl; return(EXIT_SUCCESS); }

+8

c ++ performance stl c ++ 14 gcc6

KBentley57 Sep 01 '16 at 3:59

source share

1 answer

Michael burr · Answer 1 · 2016-09-01T04:42:43+0000

This generates a for loop assembly in main() when the are_equal_manual(left,right) function is used:

 .L21: xor esi, esi test eax, eax jne .L20 cmp edx, 2 sete sil .L20: mov rax, rcx movzx esi, sil mul r8 shr rdx, 3 lea rax, [rdx+rdx*4] mov edx, ecx add rax, rax sub edx, eax mov eax, edx mov edx, ecx add rcx, 1 and edx, 7 cmp rcx, rdi

And this is what is generated when using the are_equal_alg(left,right) function:

 .L20: lea rsi, [rsp+16] mov edx, 16 mov rdi, rsp call memcmp mov ecx, eax mov rax, rbx mov rdi, rbx mul r12 shr rdx, 3 lea rax, [rdx+rdx*4] add rax, rax sub rdi, rax mov eax, ebx add rbx, 1 and eax, 7 cmp rbx, rbp mov DWORD PTR [rsp], edi mov DWORD PTR [rsp+24], eax jne .L20

I'm not quite sure what is going on in the generated code for the first case, but explicitly does not call memcmp() . It doesn't seem to compare the contents of arrays at all. Although the loop still repeats 5,000,000,000 times, it is optimized to do nothing. However, the loop that uses are_equal_alg(left,right) still does the comparison. Basically, the compiler is still able to optimize the comparison manually much better than the std::equal pattern.

Why is std :: equal much slower than a manual loop for two small std :: array?

More articles: