When can I confidently compile a program with -O3?

I saw a lot of people complaining about the -O3 option:

GCC: the program does not work with the -O3 compilation option

Floating-point issue provided by David Hamman

I check the manual from GCC:

-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options. 

And I also confirmed the code to make sure that the two options are just the two optimizations included in -O3:

 if (optimize >= 3){ flag_inline_functions = 1; flag_rename_registers = 1; } 

For these two optimizations:

-finiline-functions is useful in some cases (mainly with C ++), because it allows you to determine the size of built-in functions (600 by default) with -finline-limit. The compiler may report an out-of-memory error when a high limit is set.

-frename-registers tries to avoid false dependencies in the planned code using the registers left after register allocation. This optimization will benefit most from processors with a large number of registers.

For built-in functions, although this can reduce the number of function calls, it can lead to large binary files, so -finline-functions can introduce severe penalties in the cache and become even slower than -O2. I think that the penalty for the cache does not depend only on the program itself.

For renaming-registers, I don't think this will have any positive effect on the cisc architecture, for example x86.

My question consists of 2.5 parts:

[Answerd] 1. Am I right in saying that a program can run faster with the -O3 option, depends on the underlying platform / architecture?

EDIT: The first part is confirmed as true. David Hamman also argues that we must be very careful about how optimization operations and floating point operations interact on machines with advanced floating point points, such as Intel and AMD.

2. When can I confidently use the -O3 option? I believe that these two optimizations, especially the rename registers, may lead to different behavior from -O0 / O2. I saw that some programs compiled with -O3 were broken at runtime, is it deterministic? If I run the executable file once without any failure, does it mean that it is safe to use -O3?

EDIT: determinism has nothing to do with optimization, it is a multithreading issue. However, for a multi-threaded program, it is unsafe to use -O3 when we run the executable once without errors. David Hamman shows that optimizing O3 for floating point operations may violate a strict weak ordering for comparison. Is there any other concern we need to take care of when we want to use the -O3 option?

[Answer] 3. If the answer to the 1st question is yes, then when changing the target platform or in a distributed system with different machines, I may need to change between -O3 and -O2. Are there any general ways to decide if I can improve performance with -O3? For example, more registers, short built-in functions, etc.

EDIT: Luen replied to the third part of the answer, because "the variety of platforms makes it impossible for a general discussion about this problem." When evaluating performance gains at -O3, we should try it with both and compare our code to see which ones are faster.

+4
source share
2 answers
  1. I saw how some programs were broken when compiling with -O3, is this deterministic?

If the program is single-threaded, all the algorithms used by the program are deterministic, and if the inputs from start to start are identical, yes. The answer is "optional" if any of these conditions is incorrect.

The same thing happens if you compile without using -O3.

If I run the executable file once without any failure, does this mean that it is safe to use -O3?

Of course not. Once again, the same thing happens if you compile without using -O3. Just because your application starts once does not mean that it will work successfully in all cases. This is part of what makes testing a difficult problem.


Floating point operations can lead to strange behaviors on machines in which floating point registers are more accurate than doubles. For instance,

 void add (double a, double b, double & result) { double temp = a + b; result = temp; if (result != temp) { throw FunkyAdditionError (temp); } } 

Compile a program that uses this add function is not optimized, and you will probably never see any FunkyAdditionError exceptions. Compilation is optimized, and some inputs will start unexpectedly, leading to these exceptions. The problem is that during optimization, the compiler will make the temp register, and the result , which is a link, will not be compiled into the register. Add an inline qualifier and these exceptions may disappear when your compiler is compiled with -O3 , because now result can also be case sensitive. Optimizing for floating point operations can be a daunting task.

Finally, let's look at one of those cases when things went well, when the program was compiled with -O3, GCC: the program does not work with the ability to compile - O3 . The problem arose only with -O3, because the compiler may have built in the distance function, but saved one (but not both) of the result in a floating-point register with extended precision. With this optimization, certain points p1 and p2 can lead to an estimate of p1<p2 and p2<p1 to true . This violates a strict weak ordering for the comparison function.

You need to be very careful about how optimization operations and floating point operations interact on machines with extended floating point (for example, Intel and AMD).

+6
source

1) and 3) you are right. Some programs may take advantage of the optimization enabled by -O3, and some may not. For example, nesting more functions is usually better (since it bypasses the utility functions of the function call mechanism), but sometimes it can slow down (for example, by degrading the cache locality). This and the variety of platforms make general discussion of this problem impossible.

So, to make something short, the only correct answer is: try it with both and compare your code to see which is faster.

2) Assuming that you do not get a compiler / optimizer error (they are rare, but they exist), then it is reasonable to assume that there is an error in your program that is detected only with -O3, then it probably was there all the time, only the -O3 option detected it.

+4
source

All Articles