Why is the vector (size) slower than the new []?

I compared some STL algorithms, and I was surprised at the time spent on the following code: (I measured the compiled g ++ code [without optimization] using the time command)

 #include <vector> struct vec2{ int x, y; vec2():x(0), y(0) {} }; int main(int argc, char* argv[]){ const int size = 200000000; std::vector<vec2> tab(size); //2.26s // vec2* tab = new vec2[size]; //1.29s // tab[0].x = 0; // delete[] tab; return 0; } 

The time taken to initialize the vector is 2.26 s, and new (and delete ) is 1.29 s. What does the ctor vector do that take a lot longer? new[] calls the constructor on each element, just like vector ctor, right?

Then I compiled with -O3, it went faster, but there was a gap between the two codes. (I got 0.83 and 0.75 s respectively)

Any ideas?

+7
source share
3 answers

The speed will depend on the implementation, but most likely the reason that the vector is slower is because the vector cannot create its own elements by default. Vector elements are always copied. for example

 std::vector<vec2> tab(size); 

actually interpreted as

 std::vector<vec2> tab(size, vec2()); 

i.e. the second argument gets its value from the default argument. Then the vector allocates raw memory and copies this element, built by default, transferred externally to each element of the new vector (using copy-constructor). This can be generally slower than building the default of each element directly (as new[] does).

To illustrate the difference with the code sketch, new vec2[size] roughly equivalent

 vec2 *v = (vec2 *) malloc(size * sizeof(vec2)); for (size_t i = 0; i < size; ++i) // Default-construct `v[i]` in place new (&v[i]) vec2(); return v; 

and vector<vec2>(size) roughly equivalent

 vec2 source; // Default-constructed "original" element vec2 *v = (vec2 *) malloc(size * sizeof(vec2)); for (size_t i = 0; i < size; ++i) // Copy-construct `v[i]` in place new (&v[i]) vec2(source); return v; 

Depending on the implementation, the second approach may be slower.

The twofold difference in speed is difficult to justify, but comparing non-optimized code does not make sense. The significantly smaller difference that you have seen with optimized code is exactly what you would reasonably expect in this case.

+9
source

Both versions initialize memory.

As pointed out by several people, the vector uses copy construction, while the array uses the default constructor. Your compiler seems to optimize the latter better than the former.

Note that in Real Life, you rarely want to initialize such a huge array in one fell swoop. (What is the use of a bunch of zeros? Obviously, you intend to add something there ... And initializing hundreds of megabytes is very unsafe.)

Instead, you should write something like:

 const int size = 200000000; std::vector<vec2> v; v.reserve(size); 

Then, when you are ready to put the real vector in the vector, you use v.push_back(element) . reserve() allocates memory without initializing it; copy the push_back() construct to the reserved space.

Alternatively, if you want to put a new vector in a vector, you can use v.resize(v.size()+1) and then change the v.back() element. (This is how a “pool allocator” can work.) Although this sequence initializes the item and then overwrites it, it all happens in the L1 cache, which is almost as fast as not initializing it at all.

So, for a fair comparison, try a large vector (with reserve ) compared to an array to create a sequence of non-identical elements. You must find the vector faster.

+7
source

After analyzing the assembly generated by VC ++ for these two cases, here is what I found. The compiler built in almost everything and generated very similar loops for initialization after memory allocation. In case the vector inner loop looks like this:

 013E3FC0 test eax,eax 013E3FC2 je std::_Uninit_def_fill_n<vec2 *,unsigned int,vec2,std::allocator<vec2>,vec2>+19h (13E3FC9h) 013E3FC4 mov dword ptr [eax],edx 013E3FC6 mov dword ptr [eax+4],esi 013E3FC9 add eax,8 013E3FCC dec ecx 013E3FCD jne std::_Uninit_def_fill_n<vec2 *,unsigned int,vec2,std::allocator<vec2>,vec2>+10h (13E3FC0h) 

where edx and esi registers were looped out of loop:

 00013FB5 xor edx,edx 00013FB7 xor esi,esi 00013FB9 lea esp,[esp] 

In the case of new[] inner loop looks like this:

 009F1800 mov dword ptr [ecx],0 009F1806 mov dword ptr [ecx+4],0 009F180D add ecx,8 009F1810 dec edx 009F1811 jns main+30h (9F1800h) 

The differences are very slight, a few more instructions in the case of vector , but probably also faster than mov from registers. Since in most real cases, constructors do much more than assigning zeros, this difference can hardly be noticeable at all. Thus, the value of this testing is doubtful.

+2
source

All Articles