Does gcc automatically expand if-statements?

Let's say I have a loop that looks like this:

for(int i = 0; i < 10000; i++) { /* Do something computationally expensive */ if (i < 200 && !(i%20)) { /* Do something else */ } } 

when some trivial task gets stuck behind an if statement, which is executed only a few times. I always heard that "if-statements in loops are slow!" Thus, in the hope of (slightly) increasing performance, I divided the loops into:

 for(int i = 0; i < 200; i++) { /* Do something computationally expensive */ if (!(i%20)) { /* Do something else */ } } for(int i = 200; i < 10000; i++) { /* Do something computationally expensive */ } 

Will gcc (with corresponding flags, for example -O3) automatically split one cycle into two, or does it just expand to reduce the number of iterations?

+8
optimization c gcc
source share
1 answer

Why not just take it apart and see for yourself? But here we go. This is a test program:

 int main() { int sum = 0; int i; for(i = 0; i < 10000; i++) { if (i < 200 && !(i%20)) { sum += 0xC0DE; } sum += 0xCAFE; } printf("%d\n", sum); return 0; } 

and this is the interesting part of the disassembled code compiled with gcc 4.3.3 and -o3:

 0x08048404 <main+20>: xor ebx,ebx 0x08048406 <main+22>: push ecx 0x08048407 <main+23>: xor ecx,ecx 0x08048409 <main+25>: sub esp,0xc 0x0804840c <main+28>: lea esi,[esi+eiz*1+0x0] 0x08048410 <main+32>: cmp ecx,0xc7 0x08048416 <main+38>: jg 0x8048436 <main+70> 0x08048418 <main+40>: mov eax,ecx 0x0804841a <main+42>: imul esi 0x0804841c <main+44>: mov eax,ecx 0x0804841e <main+46>: sar eax,0x1f 0x08048421 <main+49>: sar edx,0x3 0x08048424 <main+52>: sub edx,eax 0x08048426 <main+54>: lea edx,[edx+edx*4] 0x08048429 <main+57>: shl edx,0x2 0x0804842c <main+60>: cmp ecx,edx 0x0804842e <main+62>: jne 0x8048436 <main+70> 0x08048430 <main+64>: add ebx,0xc0de 0x08048436 <main+70>: add ecx,0x1 0x08048439 <main+73>: add ebx,0xcafe 0x0804843f <main+79>: cmp ecx,0x2710 0x08048445 <main+85>: jne 0x8048410 <main+32> 0x08048447 <main+87>: mov DWORD PTR [esp+0x8],ebx 0x0804844b <main+91>: mov DWORD PTR [esp+0x4],0x8048530 0x08048453 <main+99>: mov DWORD PTR [esp],0x1 0x0804845a <main+106>: call 0x8048308 <__printf_chk@plt> 

So, as we see, there is no concrete example for this, it is not. We have only one cycle, starting with main + 32 and ending with main + 85. If you have problems reading the assembly code ecx = i; ebx = sum.

But still, your mileage may differ - who knows what heuristics are used for this particular case, so you have to compile the code that you have in mind and see how more / more complex calculations affect the optimizer.

Despite the fact that on any modern processor, the industry predictor will be very good in such simple code, so you will not see big performance losses in any case. What performance loss might be a small misprediction if your computationally intensive code needs billions of cycles?

+11
source share

All Articles