Optimization example that includes compiler reordering

C and C ++ compilers can change the order of operations if the as-if rule is executed. What is an example of such reordering performed by the compiler, and what is the potential performance that needs to be accomplished by doing this?

Examples with the participation of any (C / C ++) compiler on any platform are welcome.

+7
c ++ c compiler-optimization
source share
3 answers

Suppose the following operations are performed:

int i=0,j=0; i++; i++; i++; j++; j++; j++; 

Ignoring at the time that the three increments are likely to be optimized by the compiler to one +=3 , you will get a higher throughput of the processor pipeline if you reorder the operations as

 i++; j++; i++; j++; i++; j++; 

since j++ does not need to wait for the result of i++ , while in the previous case, most instructions had data dependency on the previous instruction. In more complex calculations, where there is no easy way to reduce the number of instructions to execute, the compiler can still look at the data dependencies and reorder the instructions so that the command depending on the result of the earlier instruction is as far away from it as possible.

Another example of such optimization is that you are dealing with pure functions . Looking again at a simple example, suppose you have a pure function f(int x) , which you summarize over a loop.

 int tot = 0; int x;//something known only at runtime for(int i = 0; i < 100; i++) tot += f(x); 

Since f is a pure function, the compiler can reorder calls to it as it pleases. In particular, it can convert this loop to

 int tot = 0; int x;//something known only at runtime int fval = f(x); for(int i = 0; i < 100; i++) tot += fval; 
+10
source share

I am sure that there are many examples where reordering operations will provide better performance. An obvious example would be reordering the loads as early as possible, since they are usually much slower than other CPU operations. By doing other, unrelated work while the memory is being removed, the processor can save time in general.

That is, given something like this:

 expensive_calculation(); x = load(); do_something(x); 

We can reorder as follows:

 x = load(); expensive_calculation(); do_something(x); 

So, while we wait for the download to complete, we can do expensive_calculation() for free.

+4
source share

Suppose you have a loop:

 for (i=0; i<n; i++) dest[i] = src[i]; 

Think memcpy . You might want the compiler to vectorize it, i.e. Download 8 or 16 bytes at a time, and then store 8 or 16 bytes at a time. Performing this conversion is a reordering, as it will read src[1] before dest[0] is saved. Moreover, if the compiler does not know that src and dest do not overlap, this is an invalid conversion, that is, the compiler is not allowed. Using the restrict keyword (C99 and newer) allows the compiler to be informed that they do not overlap so that such (extremely valuable) optimization is possible.

The same thing happens in operations with arrays that are not just copied - things like vector / matrix operations, transformations of sound / image sample data, etc.

+4
source share

All Articles