Pointers to pointers - the reason for the performance penalty

I answered this question and noticed what I consider to be strange compiler behavior.

I wrote this program first (as part of my answer there):

class Vector { private: double** ptr; public: Vector(double** _ptr): ptr(_ptr) {} inline double& operator[](const int iIndex) const { return *ptr[iIndex]; } }; extern "C" int test(const double a); int main() { double a[2] = { 1.0, 2.0 }; Vector va((double**) &a); double a1 = va[0]; test(a1); double a2 = va[0]; test(a2); } 

which generates two load commands when compiling with:

 clang -O3 -S -emit-llvm main.cpp -o main.ll 

This can be seen in llvm-IR (and can be seen in the assembly):

  define i32 @main () # 0 {
     entry:
       % a.sroa.0.0.copyload = load double *, double ** bitcast ([2 x double] * @ _ZZ4mainE1a to double **), align 16
       % 0 = load double , double *% a.sroa.0.0.copyload, align 8,! Tbaa! 2
       % call1 = tail call i32 @test (double% 0)
       % 1 = load double , double *% a.sroa.0.0.copyload, align 8,! Tbaa! 2
       % call3 = tail call i32 @test (double% 1)
       ret i32 0
     }

I would expect only one load command, since a function with a side effect on memory was not called, and I did not associate this object with something with side effects. In fact, when reading a program, I just expect two calls

 test(1.0); 

since my array is persistent in memory and everything can be configured correctly.

To be sure, I replaced the double pointer with a simple pointer:

 class Vector { private: double* ptr; public: Vector(double* _ptr): ptr(_ptr) {} inline double& operator[](const int iIndex) const { return ptr[iIndex]; } }; extern "C" int test(const double a); int main() { double a[2] = { 1.0, 2.0 }; Vector va(a); double a1 = va[0]; test(a1); double a2 = va[0]; test(a2); } 

Compiled with the same line, I get the expected result:

 define i32 @main() #0 { entry: %call1 = tail call i32 @test(double 1.000000e+00) %call3 = tail call i32 @test(double 1.000000e+00) ret i32 0 } 

It looks better optimized :)

So my question is:

What reason prevents the compiler from doing the same insertion in the first code example? Are these double pointers?

+6
source share
2 answers

The error in these lines:

 double a[2] = { 1.0, 2.0 }; Vector<double> va((double**) &a); 

a is an array of two doubles. It splits into double * , but &a not a double ** . Arrays and pointers are not the same animals.

Actually, you have the following: (void *) a == (void *) &a , because the address of the array is the address of its first element.

If you want to create a pointer to a pointer, you must explicitly create a true pointer:

 double a[2] = { 1.0, 2.0 }; double *pt = a; // or &(a[0]) ... Vector<double> va((double**) &pt); 
+2
source

In your second code, the compiler is trying to access:

 va.ptr[0] 

The compiler can infer that va.ptr matches &a[0] , and since a is a nonvolatile local variable main , it also knows that you are not changing a[0] ( test does not have β€œaccess” to a ), so it can shorten your code to just calling test with a constant value.

In your first code, however, the compiler knows that it is trying to access:

 *(((double**)&a)[index]) 

Although ((double**)&a)[index] can be output by the compiler (this value is dependent on the compiler), you will get a pointer to an address like 0x3ff0000000000000 (on my computer). What the above expression is trying to do is access the value stored at this address, but that value can be changed to test or even something else. There is no reason the compiler could have suggested that the value at this address does not change between the first access and the second.

Please note: if instead of double** you used double (*)[2] , you would get the same result as the second code, and your code would be correctly generated.


Your first code is basically equivalent:

 extern "C" int test(const double a); int main() { double a[2] = { 1.0, 2.0 }; double **pp = (double**)&a; double *p = pp[0]; double a1 = *p; test(a1); double a2 = *p; test(a2); } 

You will get the same disassembly using the command line.

Assuming an architecture with 4 double bytes and pointers, you get something like this when executed:

 0x7fff4f40 0x3f800000 # 1.0 0x7fff4f44 0x40000000 # 2.0 

Since a is an array of double , &a can decay into a double (*)[2] "with the value" 0x7fff4f40 .

Now you convert &a to double** , so you will have double **pp with a value of 0x7fff4f40 . From here, you extract double *p with pp[0] , since the pointer also has 4 bytes on my hypothetical architectures, you get 0x3f800000 .

Great, so the compiler can optimize before that, basically it can create something like this:

 double *p = (double*) 0x3f800000; double a1 = *p; test(a1); double a2 = *p; test(a2); 

Know the one million dollar question: what is at 0x3f80000 ? Well, no one knows, not even a compiler. The value at this address can be changed at any time by calling test() or even an external source.

I do not understand the size restrictions on double and pointer types, but suppose a hypothetical architecture where sizeof(double*) > 2 * sizeof(double) , the compiler will not even be able to output p , because you are trying to access values ​​outside a .

+2
source

All Articles