I always knew that rich C ++ abstractions have certain computational overheads, but I got the impression that these overheads would be close to insignificant if the correct compiler optimizers were applied. I was curious what exactly would be the size of this overhead, so I wrote a simple test to determine this. A test is a template function that takes a container variable, assigns a value to each element in the container, and then sums the values ββin the container in a separate loop. This process is repeated for a given number of cycles.
What I discovered, to my great concern, was that the implementation of the vector took almost 3 times the standard implementation of the array. Going through a huge selection of compiler optimizers without any success, I decided to bite the bullet and the eyeball with the assembly code directly to try to see what caused the fine for the time. I included some assembly directives that allowed me to pinpoint where the array indexing operation took place, and analyzed the code in detail. What I discovered, to the point of complete confusion, was that the difference between the implementation of the vector and the implementation of the array was completely insignificant. Assembly code can be found here .
This is the command I used to create the binary:
g++ -O3 vectorArrayOp.cpp -o vectorArrayOp
This is the command I used to build the assembly:
g++ -O3 -DTAGASM vectorArrayOp.cpp -S -o vectorArrayOp.s
This is the result that I observe when running the binary:
gmurphy@interloper:Reference$ ./vectorArrayOp Duration 0.027678 Duration 0.090212
The results do not change when the calculated value is included in the stdout stream, I deleted them for clarity. My system specifications are as follows (I saw the same results on my AMD):
Linux 3.2.0-32-generic x86_64 GNU/Linux Intel(R) Xeon(R) CPU X5550 @ 2.67GH g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
The code follows, I would appreciate it if someone could give me some idea of ββwhy the timings are so different when the assembly is so similar.
#include <vector> #include <iostream> #include <sys/time.h> #ifdef TAGASM #define ASMTAG(X) asm(X) #else #define ASMTAG(X) #endif enum { DataSize=1024, NumTests=(1<<16) } ; struct ReturnValue {ReturnValue(float _d, int _t):d(_d), t(_t){} float d; int t;} ; template <typename Container, typename Type> ReturnValue runTest(Container &c, Type value) { int tagValue(0); timeval startTime; gettimeofday(&startTime, NULL); for(int i=0; i<NumTests; i++) { for(int j=0; j<DataSize; j++) { ASMTAG("preassign"); c [j] = value ; ASMTAG("postassign"); } for(int j=0; j<DataSize; j++) { ASMTAG("preadd"); tagValue += c [j] ; ASMTAG("postadd"); } } timeval endTime; gettimeofday(&endTime, NULL); float duration((endTime.tv_sec-startTime.tv_sec)+ (endTime.tv_usec-startTime.tv_usec)/1000000.0); //tagValue is returned in case the optimising compiler might try to remove the loops return ReturnValue(duration, tagValue) ; } int main() { int *arrayData = new int [DataSize]; std::vector <int> vectorData(DataSize, 0) ; ReturnValue ad = runTest(arrayData, 1); ReturnValue vd = runTest(vectorData, 1); std::cout<<"Duration "<<ad.d<<std::endl; std::cout<<"Duration "<<vd.d<<std::endl; delete [] arrayData; return 0 ; }