C ++: Can the compiler optimize this code segment?

void foo(const int constant) { for(int i = 0; i < 1000000; i++) { // do stuff if(constant < 10) { // Condition is tested million times :( // inner loop stuff } } } 

For each execution of the outer loop, the value of "constant" is checked. However, the constant never changes, so it takes a lot of processor time to check the condition constant <10? again and again. A person would realize after the first few passes that a constant never changes, and it is reasonable to avoid checking it again and again. Does the compiler notice this and intelligently optimize it, or is a repeating if loop inevitable?

Personally, I believe that the problem is inevitable. Even if the compiler put a comparison in front of the outer loop and set some kind of boolean variable "skip_inner_stuff", this variable would still need to be checked for each pass of the outer loop.

What do you think about this? Is there a more efficient way to write the above code segment to avoid the problem?

+7
c ++ performance optimization compiler-optimization compiler-construction
source share
7 answers

The optimization you describe is also called loop unswitching . This has been a standard part of compiler optimization for many years, but if you want to make sure your compiler executes it, compile your sample code with some level of optimization (e.g. -O2 in gcc) and check the generated code.

However, in cases where the compiler cannot prove that the code fragment is invariant throughout the loop — for example, calling an external function that is not available at compile time — then indeed, manually raising the code outside the loop can achieve a very large increase performance.

+6
source share

The compiler can optimize the code, but you cannot expect it to perform magic tricks in your code.

Optimization is highly dependent on your code and the use of your code. For example, if you use foo as follows:

 foo(12345); 

The compiler can greatly optimize the code. Even he can calculate the result at compile time.

But if you use it like this:

 int k; cin >> k; foo(k); 

In this case, it cannot get rid of the internal if (the value is provided at run time).

I wrote sample code with MinGW / GCC-4.8.0:

 void foo(const int constant) { int x = 0; for (int i = 0; i < 1000000; i++) { x++; if (constant < 10) { x--; } } cout << x << endl; } int main() { int k; cin >> k; foo(k); } 

Check out the build code:

 004015E1 MOV EAX,0F4240 // i = 1000000 004015E6 MOV EBP,ESP 004015E8 XOR EDX,EDX 004015EA PUSH ESI 004015EB PUSH EBX 004015EC SUB ESP,10 004015EF MOV EBX,DWORD PTR SS:[EBP+8] 004015F2 XOR ECX,ECX // set ECX to 0 004015F4 CMP EBX,0A // if constant < 10 ^^^^^^^^^^ 004015F7 SETGE CL // then set ECX to 1 004015FA ADD EDX,ECX // add ECX to i 004015FC SUB EAX,1 // i-- 004015FF JNE SHORT 004015F2 // loop if i is not zero 

As you can see, the internal if code exists in the code. See CMP EBX,0A .

I repeat, it is highly dependent on lines with loops.

+3
source share

Others considered appropriate compiler optimizations: loop switching, which moves the test outside the loop and provides two separate loop bodies; and pasting the code, which in some cases will provide the compiler with the actual value of constant , so that it can remove the test and either execute the “inner loop material” unconditionally or completely remove it.

Also keep in mind that, despite everything the compiler does, modern CPU designs really do something similar to "A person will understand after the first few passes that a constant never changes." It was called dynamic branch prediction.

The key point is that checking an integer is incredibly cheap, and even accepting a branch can be very cheap. What is potentially expensive is incorrectly predicted industries. Modern processors use different strategies to figure out which direction a branch will go, but all these strategies will quickly begin to correctly predict a branch that goes the same million times in a row.

What I don’t know is whether modern processors are smart enough to determine that constant is a loop invariant and completely disables the full loop in microcode. But assuming a correct branch prediction, the loop reversal is probably a minor optimization. The more specific the processor family aimed at the compiler is, the more it knows about the quality of its branch predictor, and most likely it is that the compiler can determine if the added benefit of looping to wake up the code.

Of course, there are still minimal processors where the compiler must provide all the skills. The CPU on your PC is not one of them.

+2
source share

you can optimize it manually:

 void foo(const int constant) { if (constant < 10) { for(int i = 0; i < 1000000; i++) { // do stuff // inner loop stuff here } } else { for(int i = 0; i < 1000000; i++) { // do stuff // NO inner loop stuff here } } } 

I don't know if most compilers will do something like this, but that doesn't seem too big.

+1
source share

A good compiler can optimize it.

Compilers are optimized based on cost analysis. A good compiler should therefore evaluate the cost of each alternative (with and without a lift) and choose which one is cheaper.

This means that if the code in the inside is large, it may not be worth optimizing, as this may lead to a failure of the command cache. On the other hand, if it is cheap, then it can be raised.

If it appears in the profiler because it has not been optimized, the compiler is corrupted.

+1
source share

A good compiler optimizes this (when optimization is turned on).

If you use GCC , you can

  • compile with code generation optimization and assembly using

     gcc -Wall -O2 -fverbose-asm -S source.c 

    then look (with some editor or pager, like less ) into the generated source.s assembly source.s

  • ask GCC to dump a lot (hundreds!) of intermediate files and look inside the gimple intermediate representation in it

     gcc -Wall -O2 -fdump-tree-all -c source.c 
  • use MELT and probe to look interactive inside the jump.

Follow all warnings with -Wall from gcc (or g++ when compiling C ++ code.

By the way, such optimization ("raising the cycle loop", as another answer explains) is important, since such intermediate code occurs very often, for example. after the inlining function .... (imagine that several calls to foo were nested ...)

0
source share

In fact, the whole modern compiler does optimization and adheres to this optimization, if you think that the compiler should not do this optimization, you should make the variable "mutable".

0
source share

All Articles