After fixing the accumulation problem, others noted that I tested Visual Studio 2008 and 2010 and accumulated, really, faster than the manual cycle.
Looking at the disassembly, I saw that an additional iterator check is being performed in the manual loop, so I decided to switch to just an array to eliminate it.
Here is what I ended up with:
#include <Windows.h> #include <iostream> #include <numeric> #include <stdlib.h> int main() { const size_t vsize = 100*1000*1000; int* x = new int[vsize]; for (size_t i = 0; i < vsize; i++) x[i] = rand() % 1000; LARGE_INTEGER start,stop; long long suma = 0, sumb = 0, timea = 0, timeb = 0; QueryPerformanceCounter( &start ); suma = std::accumulate(x, x + vsize, 0LL); QueryPerformanceCounter( &stop ); timea = stop.QuadPart - start.QuadPart; QueryPerformanceCounter( &start ); for (size_t i = 0; i < vsize; ++i) sumb += x[i]; QueryPerformanceCounter( &stop ); timeb = stop.QuadPart - start.QuadPart; std::cout << "Accumulate: " << timea << " - " << suma << std::endl; std::cout << " Loop: " << timeb << " - " << sumb << std::endl; delete [] x; return 0; } Accumulate: 633942 - 49678806711 Loop: 292642 - 49678806711
Using this code, the manual loop is easily removed. The big difference is that the compiler deployed the manual loop 4 times, otherwise the generated code is almost identical.
Retired ninja
source share