Static Branch Prediction / GCC Optimization

Consider the following C program:

void bar(); void baz(); void foo( int a ) { if ( a ) { bar(); } else { baz(); } } 

On my x86-64-based computer, the commands created by GCC with an optimization level of -O1 give:

  0: sub $0x8,%rsp 4: test %edi,%edi 6: je 14 <foo+0x14> 8: mov $0x0,%eax d: callq 12 <foo+0x12> # relocation to bar 12: jmp 1e <foo+0x1e> 14: mov $0x0,%eax 19: callq 1e <foo+0x1e> # relocation to baz 1e: add $0x8,%rsp 22: retq 

whereas adding the optimization parameter -freorder-blocks (included in -O2) turns the code into:

  0: sub $0x8,%rsp 4: test %edi,%edi 6: jne 17 <foo+0x17> 8: mov $0x0,%eax d: callq 12 <foo+0x12> # relocation to baz 12: add $0x8,%rsp 16: retq 17: mov $0x0,%eax 1c: callq 21 <foo+0x21> # relocation to bar 21: add $0x8,%rsp 25: retq 

that basically the transition from a jump is equal to a jump is not equal. I know that before Pentium 4, predicting a static branch on a conditional forward branch was considered not accepted by the processor (it seems that static predictions became random on subsequent Intel processors), so I believe this optimization deals with this.

Assuming that referring to the optimized version of jne, this will mean that the else block is actually considered more likely than the if block in the program stream.

But what does that mean? Since there is no assumption about the value of the function foo in the compiler, this probability depends only on program records (who actually could use if ( !a ) instead of if ( a ) and calls to inverted functions).

Does this mean that it should be considered good practice to consider conditional blocks as exceptional cases (rather than a regular thread of execution)?

I.e:

 if ( !cond ) { // exceptional code } else { // normal continuation } 

instead:

 if ( cond ) { // normal continuation } else { // exceptional code } 

(of course, you might prefer to use the return statement inside the corresponding block to limit the size of the indent).

+7
optimization c assembly gcc x86-64
source share
1 answer

I once had a significant amount of performance optimization steps on ARM (7.9). It was a simple C, pretty dumb compiler (SDT AFAIR). One way to conserve CPU resources was to analyze the if branches and rewrite the if condition so that the normal thread would not interrupt the sequence of linear instructions. This had a positive effect both due to the processor prediction block and to more efficient use and more efficient use of the cache memory of the code segment.

I think here we see an optimization that is very close. In the first code fragment, both branches lead to a violation of the normal sequence (line with lavel 6 for one branch and 12 for the other). In the second fragment, one branch instruction is ordered to retq , and the other sequence of branches has one jump (no worse than in the first fragment). Pay attention to retq instructions 2.

So, I see that this is not about je or jne , but about the issue of reordering blocks, so branches are a sequence of linear instructions with one of them entered without any jump and completely blocking the prediction block.

Regarding "why GCC prefers one branch over another" ... I see in the documentation this may be the result of predicting a static branch (based on calls inside a translation unit?). Anyway, I would recommend playing with __builtin_expect to get a more detailed answer.

+4
source share

All Articles