Performance difference between multiple pointer markups and references

That was the question asked me a few months ago:

Which of the following functions will execute faster, Foo1 or Foo2 ?

 void Foo(SomeObjectArray** array, unsigned int size) { for (int i = 0; i < size; i++) { if (((*array) + i) != NULL) { ((*array) + i)->Operation1(); ((*array) + i)->Operation2(); ((*array) + i)->Operation3(); ((*array) + i)->Operation4(); ((*array) + i)->Operation5(); ((*array) + i)->Operation6(); } } void Foo(SomeObjectArray** array, unsigned int size) { for (int i = 0; i < size; i++) { if (*((*array) + i) != NULL) { Object& obj = *((*array) + i); obj.Operation1(); obj.Operation2(); obj.Operation3(); obj.Operation4(); obj.Operation5(); obj.Operation6(); } } } 

Please note that this is from memory, so I don’t remember the code exactly, but the general idea is the same. One function uses a pointer, while the other uses a reference (it may have a pointer to an array, as in the above code, but I don’t remember exactly). I said I am not sure, and will have to profile the code to find out, but if I had to guess Foo2 'may' be faster . They were not impressed ...

This led me several times when I came across code like this (or wrote it) and wondered what I should do in that case.

I know that...

  • This is micro optimization.
  • The compiler is most likely optimizing it.

EDIT: I changed the code a bit, so now it checks for a NULL pointer.

+8
c ++
source share
8 answers

I think this is a rather interesting question, and I saw a lot of assumptions about what the compiler can do, but I wanted to take a closer look and find out for sure. So I took the e.James program and ran it through GCC to get the build. I must say that I really do not know the meeting, so someone will correct me if I am wrong, but I think we can reasonably conclude what is happening. :)

Compilation with -O0 (no optimization)

For Foo1 we see that the array offset is calculated before each function call:

 movl 8(%ebp), %eax movl (%eax), %edx movl -4(%ebp), %eax leal (%edx,%eax), %eax movl %eax, (%esp) call __ZN10SomeObject10Operation1Ev 

This is for all six method calls, simply using different method names. Foo2 has a bit of setup code to get the link

 movl 8(%ebp), %eax movl (%eax), %edx movl -4(%ebp), %eax leal (%edx,%eax), %eax movl %eax, -8(%ebp) 

And then six of them that look like a push pointer to the stack and a function call:

 movl -8(%ebp), %eax movl %eax, (%esp) call __ZN10SomeObject10Operation1Ev 

Pretty much what we would expect without optimization. The output was

 Foo1: 18472 Foo2: 17684 

Compilation with -O1 (minimal optimization)

Foo1 bit more efficient, but still resets the array offset every time:

 movl %esi, %eax addl (%ebx), %eax movl %eax, (%esp) call __ZN10SomeObject10Operation1Ev 

Foo2 looks, saves the value of ebx ( addl (%edi), %ebx ), and then makes the following calls:

 movl %ebx, (%esp) call __ZN10SomeObject10Operation1Ev 

Were at times

 Foo1: 4979 Foo2: 4977 

Compilation with -O2 (moderate optimization)

When compiling with -O2 GCC just got rid of all this, and every call to Foo1 or Foo2 just added 594 to dummy (99 increments * 6 calls = 594 increments):

 imull $594, %eax, %eax addl %eax, _dummy 

There were no calls to the methods of the object, although these methods remained in the code. As expected, times were here

 Foo1: 1 Foo2: 0 

I think this tells us that Foo2 works a little faster without optimization, but this is actually a moot point, because as soon as it starts optimization, the compiler just moves a couple of lengths between the stack and the registers.

+6
source share

Strictly speaking, without optimizations, I would say that Foo2 is faster, because Foo1 has to calculate the destination URL every time, but this will not happen anywhere.

I would say that the compiler will optimize it and leave it the same.
It seems that there is a lot of space for the compiler, i and array do not change for the whole block at each iteration, so it can optimize it by placing the pointer in a register, just like for a link.

+4
source share

The compiler will probably optimize them so that they are the same, given the general subexpressions on each line. However, no guarantees.

There are no rational conclusions that you can reach by reasoning in the same way as today, with compilers and processors. The only way to find out is to try time. If someone did not make it clear that this is the answer to the question, it will be an automatic rejection of me.

+3
source share

No intelligent compiler would happily make them equivalent. You are not talking about NRVO in the depths of template metaprogramming here, it is simple and simple optimization using Common Subexpression Elimination, which is extremely common and relatively simple, and the code is trivial in complexity, which makes it extremely likely that the compiler will make such optimization.

+2
source share

Just in case, if anyone doubts that the compiler will be optimized with the same result, here is a quick and dirty test program:

 #include <iostream> #include <time.h> using namespace std; size_t dummy; class SomeObject { public: void Operation1(); void Operation2(); void Operation3(); void Operation4(); void Operation5(); void Operation6(); }; void SomeObject::Operation1() { for (int i = 1; i < 100; i++) { dummy++; } } void SomeObject::Operation2() { for (int i = 1; i < 100; i++) { dummy++; } } void SomeObject::Operation3() { for (int i = 1; i < 100; i++) { dummy++; } } void SomeObject::Operation4() { for (int i = 1; i < 100; i++) { dummy++; } } void SomeObject::Operation5() { for (int i = 1; i < 100; i++) { dummy++; } } void SomeObject::Operation6() { for (int i = 1; i < 100; i++) { dummy++; } } void Foo1(SomeObject** array, unsigned int size) { for (int i = 0; i < size; i++) { ((*array) + i)->Operation1(); ((*array) + i)->Operation2(); ((*array) + i)->Operation3(); ((*array) + i)->Operation4(); ((*array) + i)->Operation5(); ((*array) + i)->Operation6(); } } void Foo2(SomeObject** array, unsigned int size) { for (int i = 0; i < size; i++) { SomeObject& obj = *((*array) + i); obj.Operation1(); obj.Operation2(); obj.Operation3(); obj.Operation4(); obj.Operation5(); obj.Operation6(); } } int main(int argc, char * argv[]) { clock_t timer; SomeObject * array[100]; for (int i = 0; i < 100; i++) { array[i] = new SomeObject(); } timer = clock(); for (int i = 0; i < 100000; i++) { Foo1(array, 100); } cout << "Foo1: " << clock() - timer << endl; timer = clock(); for (int i = 0; i < 100000; i++) { Foo2(array, 100); } cout << "Foo2: " << clock() - timer << endl; for (int i = 0; i < 100; i++) { delete array[i]; } return 0; } 

Results are always within a few milliseconds of each other:

Foo1: 15437
Foo2: 15484

+2
source share

IMHO, the question of which version is faster does not matter. Calling 6 different methods one after another at the facility is the smell of OO design. The object should probably offer one method that does all this.

+1
source share

Foo2 has this extra object creation, but also compilation should make them approximately the same

0
source share

I like this question, and I can’t help myself, but answer, although I know that I’m completely wrong. However, I think Foo1 will be faster.

My stupid mind? Well, I see that Foo2 creates a reference to the object, gets the address "array", and then calls its methods.

But in Foo1, it directly uses the address, casts it, goes into the object's memory, and simply calls the function directly. There is no unnecessary object reference created in Foo1, such as Foo2. And we do not know what the depth of the array is in terms of inheritance, and how many constructors of the base class will be called, even to get a reference to the object, which is extra time. So I think Foo1 is a little faster. Plz correct me because i'm sure i'm wrong. Hooray!

0
source share

All Articles