Optional pop instructions in functions with an early if statement

while playing with godbolt.org, I noticed that gcc (6.2, 7.0 snapshot), clang (3.9) and icc (17) when compiling something close to

int a(int* a, int* b) { if (b - a < 2) return *a = ~*a; // register intensive code here eg sorting network } 

compiles (-O2 / -O3) into something like this:

  push r15 mov rax, rcx push r14 sub rax, rdx push r13 push r12 push rbp push rbx sub rsp, 184 mov QWORD PTR [rsp], rdx cmp rax, 7 jg .L95 not DWORD PTR [rdx] .L162: add rsp, 184 pop rbx pop rbp pop r12 pop r13 pop r14 pop r15 ret 

which obviously has huge overhead in the case of ba <2. In the case of -G, gcc compiles:

  mov rax, rcx sub rax, rdx cmp rax, 7 jg .L74 not DWORD PTR [rdx] ret .L74: 

This leads me to the fact that there is no code allowing the compiler to emit this shorter code.

Is there a reason why compilers do this? Is there a way to get them to compile a shorter version without compiling for size?


Here's an example on Godbolt that reproduces this. This seems to be due to the fact that the complex part is recursive.

+7
c ++ assembly gcc clang icc
source share
1 answer

This is a known compiler limitation; see my comments on the question. IDK why it exists; it may be difficult for compilers to decide what they can do without spilling when they have not finished saving regs.

Pulling an early check into a wrapper is often useful when it's small enough to embed.


It seems like modern gcc can really get around this compiler limitation.

Using your example in the Godbolt compiler explorer, adding a second caller is enough to even get gcc6.1-O2 to separate this function for you so that it can embed an early output into the second caller and into square() visibility (ending in jmp square(int*, int*) [clone .part.3] if the previous return path fails).

code on Godbolt , note I added -std=gnu++14 , which is required for clang to compile your code.

 void square_inlinewrapper(int* a, int* b) { //if (b - a < 16) return; // gcc inlines this part for us, and calls a private clone of the function! return square(a, b); } # gcc6.1 -O2 (default / generic -march= and -mtune=) mov rax, rsi sub rax, rdi cmp rax, 63 jg .L9 rep ret .L9: jmp square(int*, int*) [clone .part.3] 

square() itself compiles to the same thing, invoking a private clone that has the bulk of the code. Recursive calls from within the clone call the wrapper function, so they don’t do the extra push / pop work when it’s not needed.


Even gcc7 does not do this when there is no other caller, even with -O3. It still converts one of the recursive calls into a loop, and the other just calls a big function again.


Clang 3.9 and icc17 also do not clone the function, so you must manually write the built-in shell (and change the main element of the function to use it for recursive calls, if verification is needed there).

You might want to name the wrapper square and rename only the main part to a private name (for example, static void square_impl ).

+2
source share

All Articles