How does GCC optimize an unused variable that grows inside the loop?

I wrote this simple C program:

int main() { int i; int count = 0; for(i = 0; i < 2000000000; i++){ count = count + 1; } } 

I wanted to see how the gcc compiler optimizes this loop (explicitly add 1,200,000,000 times, it should be "add 2,000,000,000 once"). So:

gcc test.c and then time on a.out gives:

 real 0m7.717s user 0m7.710s sys 0m0.000s 

$ gcc -O2 test.c , and then time on a.out` gives:

 real 0m0.003s user 0m0.000s sys 0m0.000s 

Then I parsed both with gcc -S . The first seems perfectly clear:

  .file "test.c" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movl $0, -8(%rbp) movl $0, -4(%rbp) jmp .L2 .L3: addl $1, -8(%rbp) addl $1, -4(%rbp) .L2: cmpl $1999999999, -4(%rbp) jle .L3 leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits 

L3 adds, L2 compares -4(%rbp) with 1999999999 and goes to L3 if i < 2000000000 .

Now optimized:

  .file "test.c" .text .p2align 4,,15 .globl main .type main, @function main: .LFB0: .cfi_startproc rep ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits 

I can’t understand what is going on there! I have little knowledge about the assembly, but I was expecting something like

 addl $2000000000, -8(%rbp) 

I even tried using gcc -c -g -Wa, -a, -ad -O2 test.c to see the C code along with the assembly into which it was converted, but the result was no more clear that the previous one.

Can someone explain briefly:

  • Output gcc -S -O2 .
  • If the loop is optimized, as I expected (one sum instead of many sums)?
+63
c compiler-optimization gcc disassembly
Jan 12 2018-12-12T00:
source share
2 answers

The compiler is even smarter. :)

In fact, he understands that you are not using the result of the loop. Thus, he completely captured the entire cycle!

This is called Dead Code Elimination .

The best test is printing the result:

 #include <stdio.h> int main(void) { int i; int count = 0; for(i = 0; i < 2000000000; i++){ count = count + 1; } // Print result to prevent Dead Code Elimination printf("%d\n", count); } 

EDIT: I added the required #include <stdio.h> ; the MSVC build list matches the version without #include , but it should be the same.




I do not have GCC in front of me at the moment, since I am loaded into Windows. But here is the disassembly of the version from printf() on MSVC:

EDIT: I had the wrong collection. Here is the correct one.

 ; 57 : int main(){ $LN8: sub rsp, 40 ; 00000028H ; 58 : ; 59 : ; 60 : int i; int count = 0; ; 61 : for(i = 0; i < 2000000000; i++){ ; 62 : count = count + 1; ; 63 : } ; 64 : ; 65 : // Print result to prevent Dead Code Elimination ; 66 : printf("%d\n",count); lea rcx, OFFSET FLAT:??_C@_03PMGGPEJJ@?$CFd?6?$AA@ mov edx, 2000000000 ; 77359400H call QWORD PTR __imp_printf ; 67 : ; 68 : ; 69 : ; 70 : ; 71 : return 0; xor eax, eax ; 72 : } add rsp, 40 ; 00000028H ret 0 

So, Visual Studio does this optimization. I would suggest that GCC probably does too.

And yes, GCC does a similar optimization. Here is the list of builds for the same program with gcc -S -O2 test.c (gcc 4.5.2, Ubuntu 11.10, x86):

  .file "test.c" .section .rodata.str1.1,"aMS",@progbits,1 .LC0: .string "%d\n" .text .p2align 4,,15 .globl main .type main, @function main: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $2000000000, 8(%esp) movl $.LC0, 4(%esp) movl $1, (%esp) call __printf_chk leave ret .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits 
+72
Jan 12 2018-12-12T00:
source share

Compilers have several tools at their disposal to make the code more efficient or more "efficient":

  • If the result of the calculation is never used, the code that performs the calculation may be omitted (if the calculation acted on volatile values, these values ​​should still be read, but the read results may be ignored). If the results of the calculations that supplied it were not used, the code that performs them can also be omitted. If such an omission makes the code the same for both paths on the conditional branch, the condition can be considered unused and omitted. This will not affect the behavior (other than the runtime) of any program that does not make border access images or causes application L to call “Critical Undefined Behaviors”.

  • If the compiler determines that the machine code that calculates the value can only give results in a certain range, it can omit any conditional tests whose results could be predicted on this basis. As above, this will not affect behavior other than runtime if the code does not cause Critical Undefined Behaviors.

  • If the compiler determines that certain inputs will invoke any form of Undefined Behavior with code written, the standard will allow the compiler to omit any code that matters only when such inputs are received, even if the natural behavior of the execution platform, given such inputs, was would be benign, and rewriting the compiler would make it dangerous.

Good compilers do # 1 and # 2. For some reason, however, number 3 has become fashionable.

0
Jul 14 '16 at 15:45
source share



All Articles