Why does it accumulate faster than just for a loop?

I tested the algorithms and ran this weird behavior when std::accumulate faster than a simple for loop.

Looking at the generated assembler, I'm not much wiser :-) It seems that the for loop is optimized in MMX instructions, and the drive expands into a loop.

This is the code. Behavior manifests itself with an optimization level of -O3 , gcc 4.7.1

 #include <vector> #include <chrono> #include <iostream> #include <random> #include <algorithm> using namespace std; int main() { const size_t vsize = 100*1000*1000; vector<int> x; x.reserve(vsize); mt19937 rng; rng.seed(chrono::system_clock::to_time_t(chrono::system_clock::now())); uniform_int_distribution<uint32_t> dist(0,10); for (size_t i = 0; i < vsize; i++) { x.push_back(dist(rng)); } long long tmp = 0; for (size_t i = 0; i < vsize; i++) { tmp += x[i]; } cout << "dry run " << tmp << endl; auto start = chrono::high_resolution_clock::now(); long long suma = accumulate(x.begin(),x.end(),0); auto end = chrono::high_resolution_clock::now(); cout << "Accumulate runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl; start = chrono::high_resolution_clock::now(); suma = 0; for (size_t i = 0; i < vsize; i++) { suma += x[i]; } end = chrono::high_resolution_clock::now(); cout << "Manual sum runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl; return 0; } 
+8
c ++ performance
source share
3 answers

When you pass 0 for accumulation, you make it accumulate using int instead of long.

If you code your manual loop as follows, it will be equivalent:

 int sumb = 0; for (size_t i = 0; i < vsize; i++) { sumb += x[i]; } suma = sumb; 

or you can call like this:

 long long suma = accumulate(x.begin(),x.end(),0LL); 
+9
source share

I have several different results using Visual Studio 2012

 // original code Accumulate runtime 93600 ms Manual sum runtime 140400 ms 

Note that the source code of std::accumulate not equivalent to the for loop, because the third parameter of std::accumulate is int 0. It performs the summation using int and only at the end saves the result in long long . Changing the third parameter to 0LL forces the algorithm to use the long long battery and leads to the following times.

 // change std::accumulate initial value -> 0LL Accumulate runtime 265200 ms Manual sum runtime 140400 ms 

Since the final result matches the int value, I changed suma and std::accumulate to use only int values. After this change, the MSVC 2012 compiler was able to auto-vectorize the for loop and led to the following points.

 // change suma from long long to int Accumulate runtime 93600 ms Manual sum runtime 46800 ms 
+5
source share

After fixing the accumulation problem, others noted that I tested Visual Studio 2008 and 2010 and accumulated, really, faster than the manual cycle.

Looking at the disassembly, I saw that an additional iterator check is being performed in the manual loop, so I decided to switch to just an array to eliminate it.

Here is what I ended up with:

 #include <Windows.h> #include <iostream> #include <numeric> #include <stdlib.h> int main() { const size_t vsize = 100*1000*1000; int* x = new int[vsize]; for (size_t i = 0; i < vsize; i++) x[i] = rand() % 1000; LARGE_INTEGER start,stop; long long suma = 0, sumb = 0, timea = 0, timeb = 0; QueryPerformanceCounter( &start ); suma = std::accumulate(x, x + vsize, 0LL); QueryPerformanceCounter( &stop ); timea = stop.QuadPart - start.QuadPart; QueryPerformanceCounter( &start ); for (size_t i = 0; i < vsize; ++i) sumb += x[i]; QueryPerformanceCounter( &stop ); timeb = stop.QuadPart - start.QuadPart; std::cout << "Accumulate: " << timea << " - " << suma << std::endl; std::cout << " Loop: " << timeb << " - " << sumb << std::endl; delete [] x; return 0; } Accumulate: 633942 - 49678806711 Loop: 292642 - 49678806711 

Using this code, the manual loop is easily removed. The big difference is that the compiler deployed the manual loop 4 times, otherwise the generated code is almost identical.

+1
source share

All Articles