GCC 4.9.2 in the compiler explorer does the loop deployment and includes many function calls, while Clang 3.5.1 calls fib twice every iteration without even a tail call optimization as shown below
fib(int): # @fib(int) push rbp push rbx push rax mov ebx, edi cmp ebx, 2 jge .LBB0_1 mov eax, ebx jmp .LBB0_3 .LBB0_1: lea edi, dword ptr [rbx - 1] call fib(int) # fib(ebx - 1) mov ebp, eax add ebx, -2 mov edi, ebx call fib(int) # fib(ebx - 2) add eax, ebp .LBB0_3: add rsp, 8 pop rbx pop rbp ret
The GCC version is more than 10 times longer: only one fib call and 20+ labels to embed the call , which also means that the last call was optimized for jmp or GCC converted part of the recursion to iteration (since it allocates a large array to store intermediate values)
I also introduced ICC in perspective, and surprisingly it contains 10 call instructions inside fib , and it also calls inline fib 9 times inside main , but it does not convert recursive code to iterative
Here the compiler displays for comparison
Please note that you can change the code to make the output easier to read.
int fib(int n) { if (n<2) return n; int t = fib(n-1); return t + fib(n-2); }
Now the compiler explorer will highlight which line of the source code the instruction in the output of the assembly matches with different colors, and you can easily see how two calls are made. The string return t + fib(n-2) compiled by GCC for
.L3: mov eax, DWORD PTR [rsp+112] # n, %sfp add edx, DWORD PTR [rsp+64] # D.35656, %sfp add DWORD PTR [rsp+60], edx # %sfp, D.35656 sub DWORD PTR [rsp+104], 2 # %sfp,