Why don't modern C ++ compilers optimize simple loops? (Clang, MSVC)

Question

Why don't modern C ++ compilers optimize simple loops? (Clang, MSVC)

When I compile and run this code using Clang ( -O3 ) or MSVC ( /O2 ) ...

 #include <stdio.h> #include <time.h> static int const N = 0x8000; int main() { clock_t const start = clock(); for (int i = 0; i < N; ++i) { int a[N]; // Never used outside of this block, but not optimized away for (int j = 0; j < N; ++j) { ++a[j]; // This is undefined behavior (due to possible // signed integer overflow), but Clang doesn't see it } } clock_t const finish = clock(); fprintf(stderr, "%u ms\n", static_cast<unsigned int>((finish - start) * 1000 / CLOCKS_PER_SEC)); return 0; }

... the loop is not optimized.

In addition, neither Clang 3.6 and Visual C ++ 2013 and GCC 4.8.1 tells me that the variable is uninitialized!

Now I understand that the lack of optimization is not a mistake per se, but I find it surprising given that compilers should be pretty smart these days. It looks like such a simple piece of code that even viability analysis methods from a decade ago should be able to optimize the variable a and, therefore, the whole cycle - do not pay attention to the fact that the increment of the variable is already undefined.

But only GCC can understand that this is no-op, and not one of the compilers tells me that it is an uninitialized variable.

Why is this? What prevents a simple viability analysis from telling the compiler that a not in use? Moreover, why does the compiler not detect that a[j] uninitialized in the first place? Why don't existing uninitialized detector variables in all of these compilers catch this obvious error?

+24

c ++ compiler-optimization gcc visual-c ++ clang

Mehrdad Aug 6 '14 at 2:52

source share

4 answers

Richard Smith · Answer 1 · 2014-08-06 22:24

The behavior of undefined does not matter here. Replacing the internal circuit with:

  for (int j = 1; j < N; ++j) { a[j-1] = a[j]; a[j] = j; }

... has the same effect, at least with Clang.

The problem is that the inner loop is loaded from a[j] (for some j ) and stored in a[j] (for some j ). None of the stores can be deleted, because the compiler believes that they can be visible for later loads, and none of the loads can be deleted, because their values are used (as input to later stores). As a result, the loop still has side effects in memory, so the compiler does not see that it can be deleted.

Unlike nm's answer, replacing int with unsigned will not solve the problem. The code generated by Clang 3.4.1 using int and using unsigned int is identical.

James Kanze · Answer 2 · 2014-08-06 09:44

This is an interesting optimization problem. I would expect that in most cases the compiler will treat each element of the array as an individual variable when executing dead code analysis. Ans 0x8000 make too many separate variables for the track, so the compiler does not try. The fact that a[j] not always access to the same object can cause problems as well as for the optimizer.

Obviously, different compilers use different heuristics; the compiler can consider the array as one object and detect that it never influenced the result (observed behavior). Some compilers may not do this, however, on the grounds that, as a rule, this is a big job for very little gain: how often will such optimizations be applied in real code?

bames53 · Answer 3 · 2014-08-11 17:05

 ++a[j]; // This is undefined behavior too, but Clang doesn't see it

Are you saying this behavior is undefined because the elements of the array are uninitialized?

If so, although this is a standard interpretation of clause 4.1 / 1 in the standard, I believe that this is not true. Elements are "uninitialized" in the sense that programmers usually use this term, but I do not believe that this exactly matches the C ++ language specification.

In particular, C ++ 11 8.5 / 11 states that these objects are actually initialized by default, and this seems to me mutually exclusive, since it is not initialized. The standard also states that for some objects that are initialized by default, it means "not initialized." Some may assume that this means that they are uninitialized, but this is not indicated, and I just think that such performance is not required.

The specification clearly states that array elements will have undefined values. C ++ indicates, referring to the C standard, that undefined values can be either valid representations, or legal to access regular, or trap representations. If the specific undefined values of the array elements with all of them are real representations (and none of them is INT_MAX, avoiding overflow), then the above line does not cause any undefined behavior in C ++ 11.

Since these elements of the array can be traps, it would be perfectly consistent with the fact that clang would act as if they were guaranteed to be trap representations, effectively choosing the UB code to create optimization opportunities.

Even if clang does not do this, it can still choose to optimize based on the data stream. Klang knows how to do this, as evidenced by the fact that if the inner cycle changes slightly, then the cycles are deleted.

So, why does the (optional) presence of UB seem to optimize optimization when UB is usually seen as an opportunity for more optimization?

What could happen is that clang decided that users want int traps based on the behavior of the equipment. And therefore, instead of using traps as an optimization opportunity, clang should generate code that accurately reproduces the behavior of the program in hardware. This means that loops cannot be excluded based on the data stream, as this can eliminate hardware traps.

C ++ 14 updates the behavior, so that access to the values of undefined values itself creates undefined behavior, regardless of whether this variable is considered uninitialized or not:

nh_ · Answer 4 · 2014-08-06 08:25

It is really very interesting. I tried your example with MSVC 2013. My first idea was that the fact that ++ a [j] is somewhat undefined is the reason that the loop is not deleted, since deleting this will definitely change the program value from undefined / Incorrect semantics into something meaningful, so I tried to initialize the values earlier, but the loops still don't disappear.

Then I replaced ++ a [j]; with a [j] = 0; which then generated the output without any loop, so everything that was between the two clock () calls was deleted. I can only guess about the reason. Perhaps the optimizer cannot prove that the ++ operator has no side effects for any reason.

Why don't modern C ++ compilers optimize simple loops? (Clang, MSVC)

More articles: