Why are optimizers of these functions not optimized?

Question

Why are optimizers of these functions not optimized?

I tried compiling this code with both Clang and GCC:

struct s { int _[50]; }; void (*pF)(const struct s), (*pF1)(struct s), (*pF2)(struct s *); main() { struct sa; pF2(&a); pF(a), pF1(a); }

The result is the same. Although the call to pF does not allow you to change its only argument, the object a copied for the second call to pF1 . Why is this?

Here is the build output (from GCC):

 ; main push rbx sub rsp, 0D0h mov rbx, rsp mov rdi, rsp call cs:pF2 ;create argument for pF1 call (as there the argument is modified) ;and copy the local a into it ;although it seems not needed because the local isn't futher read anyway sub rsp, 0D0h mov rsi, rbx mov ecx, 19h mov rdi, rsp ; rep movsq call cs:pF ;copy the local a into the argument created once again ;though the argument cannot be modified by the function pointed by pF mov rdi, rsp mov rsi, rbx mov ecx, 19h rep movsq ; call cs:pF1 add rsp, 1A0h xor eax, eax pop rbx retn

Could the optimizer see that since the function specified by pF cannot change its parameter (as const declared) and thus omit the last copy operation? In addition, I recently saw that, since the variable a not read further in the code, it can use its storage for function arguments.

The same code can be written as:

 ; main push rbx sub rsp, 0D0h mov rdi, rsp call cs:pF2 call cs:pF call cs:pF1 add rsp, 0D0h xor eax, eax pop rbx retn

I am compiling the -O3 flag. Did I miss something?

This is the same even if I do not call UB (since the function pointers are NULL by default) and I initialize them instead:

 #include <stdio.h> struct s { int _[50]; }; extern void f2(struct s *a); void (*pF)(const struct s), (*pF1)(struct s), (*pF2)(struct s *) = f2; extern void f1(struct sa) { a._[2] = 90; } extern void f(const struct sa) { for(size_t i = 0; i < sizeof(a._)/sizeof(a._[0]); ++i) printf("%d\n", a._[i]); } extern void f2(struct s *a) { a->_[6] = 90; pF1 = f1, pF = f; }

+8

optimization c assembly gcc clang

AnArrayOfFunctions Mar 08 '16 at 12:52

source share

2 answers

gsg · Answer 1 · 2016-03-08T13:51:37+0000

I do not think this optimization is legal. You do not notice that the type of the function with the argument const is compatible with the type of the function with the non-const argument, so the function that mutates its argument can be assigned to the pF pointer.

Here is an example program:

 struct s { int x; }; /* Black hole so that DCE doesn't eat everything */ void observe(void *); void (*pF)(const struct s); void test(struct s arg) { arg.x = 0; observe(&arg); } void assignment(void) { pF = test; }

The bottom line is that the const annotation for the argument gives the compiler no reliable information about whether the memory of the argument has been changed by the called user. Performing this optimization would apparently require the ABI to be such that the argument store should not be mutated (or some kind of whole program analysis, but not important).

Peter Cordes · Answer 2 · 2016-03-08T15:18:16+0000

I think that the function should still make one copy (see the end for what I consider the most optimal valid version). The remaining (more or less understandable) failures of optimization.

SysV x86-64 ABI does not guarantee that a function will not modify its stack-args arguments. He does not say anything about const . All that he does not guarantee cannot be accepted. He simply says that large objects passed by value go on the stack; nothing about the state when the called function returns. The caller "owns" its arguments, even if they are declared const . See Also x86 , but the ABI doc itself is the only wiki link that is really relevant.

Similarly, narrow integer types can be in registers with garbage in high bits as arguments or return values. ABI clearly does not say anything in any way, so there is no guarantee that high bits will be reset. This is actually what gcc does: it assumes that there is a lot of garbage on receiving the values and a lot of garbage will be left on passing the values. The same goes for float / double in xmm regs. I have confirmed this with one of the ABI developers recently, by studying some unsafe codes generated by clang. Therefore, I am sure that the correct interpretation is that you should not assume that the ABI is not explicitly guaranteed.

gcc does not do this, but I believe it would be legal not to make a copy for a function called like this:

 void modifyconstarg(const struct sx) { // x.arr[10] = 10; // This is a compile-time error struct s xtmp = x; // gcc/clang: make a full copy before this xtmp.arr[11]=11; pFconstval(xtmp); // gcc/clang: make a full copy here }

Instead, just save it in arg and jmp pFconstval .

My guess is that missed optimizations, not gcc and clang, are conservative in their interpretation of the standard.

It seems that gcc and clang do not do much work to optimize copies for objects too large to fit in a register. The source code, which did not copy them in the first place, would be even better than the best work the compiler could handle (for example, pass by const * or C ++ const-reference), since I do not think that your The proposed optimization is legal.

gcc and clang do much worse than better legal optimization: see the output on godbolt .

Strange: with -march=haswell (or any other Intel processor) gcc issues a memcpy function call instead of rep movsq inline code. I do not understand. He does this even with -ffreestanding / -nostdlib

IDK, if someone else thought rdi was a pointer to memory, i.e. that it was transmitted using an invisible link. It took me a long time to fully understand that call functions by value do not accept any parameters in registers at all. I kept thinking that it was strange that rep movsq left rdi , indicating a high copy.

You do not need function pointers to reproduce this; normal prototype functions (and more descriptive names) still demonstrate it.

 struct s { int _[50]; }; //void (*pFconstval)(const struct s), (*pFval)(struct s), (*pFref)(struct s *); void pFref(struct s *); void pFconstval(const struct s), pFval(struct s); void func(void) { struct sa; pFref(&a); pFconstval(a); pFval(a); } void modifyconstarg(const struct sx) { // x.arr[10] = 10; // This is a compile-time error struct s xtmp = x; // full copy here xtmp.arr[11]=11; pFconstval(xtmp); // full copy here } void modifyarg(struct sx) { x.arr[10] = 10; pFconstval(x); }

The gcc output for modifyarg is fun:

  lea rdi, [rsp+8] mov DWORD PTR [rsp+48], 10 mov ecx, 25 mov rsi, rdi ; src=dest rep movsq ; in-place "copy" jmp pFconstval

It copies even if you do not change x . Clang makes the actual copy elsewhere before the tail of the jmp .

The best legal version of your function

as I understand ABI:

  sub rsp, 416 mov rdi, rsp call pFref ; or call [pF2] if using function pointers. Is your disassembly in MASM syntax? lea rdi, [rsp+208] ; aligned by 16 for hopefully better rep movsq perf ; and so the stack is aligned by 16 at each location mov rsi, rsp mov ecx, 25 rep movsq call pFconstval ; clobbering the low copy add rsp, 208 call pFval ; clobbering the remaining high copy add rsp, 208 ret

BTW, gcc using rbx stupid. It stores four bytes of code:
push / pop : 2 bytes. mov rbx, rsp : 3B. 2x mov rsi, rbx : 2x3B. Total = 12V

replacing all of this with 2x lea rsi, [rsp+208] : 2x 8B. Total = 16B.

He does not avoid additional stack synchronization, as mov rdi, rsp . 4B code is not worth spending 3 times. In my version, which only copies once (and it needs only one LEA), this is also a loss in bytes of code.

Why are optimizers of these functions not optimized?

gcc and clang do much worse than better legal optimization: see the output on godbolt .

You do not need function pointers to reproduce this; normal prototype functions (and more descriptive names) still demonstrate it.

The best legal version of your function

More articles: