I wrote this simple C program:
int main() { int i; int count = 0; for(i = 0; i < 2000000000; i++){ count = count + 1; } }
I wanted to see how the gcc compiler optimizes this loop (explicitly add 1,200,000,000 times, it should be "add 2,000,000,000 once"). So:
gcc test.c and then time on a.out gives:
real 0m7.717s user 0m7.710s sys 0m0.000s
$ gcc -O2 test.c , and then time on a.out` gives:
real 0m0.003s user 0m0.000s sys 0m0.000s
Then I parsed both with gcc -S . The first seems perfectly clear:
.file "test.c" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movl $0, -8(%rbp) movl $0, -4(%rbp) jmp .L2 .L3: addl $1, -8(%rbp) addl $1, -4(%rbp) .L2: cmpl $1999999999, -4(%rbp) jle .L3 leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits
L3 adds, L2 compares -4(%rbp) with 1999999999 and goes to L3 if i < 2000000000 .
Now optimized:
.file "test.c" .text .p2align 4,,15 .globl main .type main, @function main: .LFB0: .cfi_startproc rep ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2" .section .note.GNU-stack,"",@progbits
I can’t understand what is going on there! I have little knowledge about the assembly, but I was expecting something like
addl $2000000000, -8(%rbp)
I even tried using gcc -c -g -Wa, -a, -ad -O2 test.c to see the C code along with the assembly into which it was converted, but the result was no more clear that the previous one.
Can someone explain briefly:
- Output gcc -S -O2 .
- If the loop is optimized, as I expected (one sum instead of many sums)?