Optimization of reordering compiler commands in C ++ (and what blocks them)

I reduced my code to the next one, which is as simple as I could do it, while preserving the compiler that interests me.

void foo(const uint64_t used) { uint64_t ar[100]; for(int i = 0; i < 100; ++i) { ar[i] = some_global_array[i]; } const uint64_t mask = ar[0]; if((used & mask) != 0) { return; } bar(ar); // Not inlined } 

Using VC10 with / O 2 and / Ob1, the generated assembly largely reflects the order of instructions in the above C ++ code. Since the local ar array is only passed to bar() when the condition fails and is not used otherwise, I expected the compiler to optimize something like the following.

 if((used & some_global_array[0]) != 0) { return; } // Now do the copying to ar and call bar(ar)... 

Does the compiler not do this because it is too difficult to define such optimizations in the general case? Or does this follow from some strict rule prohibiting it? If so, why, and is there any way I can give him a hint that this will not change the semantics of my program?

Note: obviously, it would be trivial to get optimized output by simply rearranging the code, but I wonder why the compiler will not optimize in such cases, and not how to do it in this (intentionally simplified) case.

+3
c ++ optimization compiler-optimization visual-c ++
source share
2 answers

There are no "strict rules" controlling which assembly language is allowed for the compiler. If the compiler can be sure that the code block does not need to be executed (because it has no side effects) due to some precondition, then it is absolutely allowed to short-circuit all of this.

Such optimization can be quite difficult in the general case, and your compiler may not go to all of this effort. If this is critical performance code, then you can fine-tune the source code (as you suggest) to help the compiler generate the best assembler code. This is a trial and error process, although you may have to do it again for the next version of the compiler.

+1
source share

Probably the reason this is not being optimized is the global array. The compiler cannot know in advance if, say, access to some_global_array[99] will lead to the generation of any exception / signal, so it must complete the whole cycle. Everything would be completely different if the global array were statically defined in one compilation unit.

For example, in LLVM, the following three global array definitions will give completely different outputs to this function:

 // this yields pretty much what you're seeing uint64_t *some_global_array; // this calls memcpy and then performs the conditional check uint64_t some_global_array[100] = {0}; // this calls memset (not memcpy!) on the ar array and then bar directly (no // conditional checks since the array is const and filled with 0s, so the if // is always false) const uint64_t some_global_array[100] = {0}; 

The second is rather cryptic, but it may just be a missed optimization (or maybe I missed something else).

+3
source share

All Articles