Deployment and loop optimization

Based on the code:

for (int i = 0; i < n; ++i) 
{ 
  A(i) ; 
  B(i) ; 
  C(i) ; 
}

And the optimization version:

for (int i = 0; i < (n - 2); i+=3) 
{ 
  A(i) 
  A(i+1) 
  A(i+2) 
  B(i) 
  B(i+1) 
  B(i+2) 
  C(i) 
  C(i+1) 
  C(i+2)
}

Something is incomprehensible to me: which is better? I do not see anything that works faster using a different version. Am I missing something?

All I see is that each command depends on the previous instruction, which means that I need to wait for the completion of the previous instruction to start it after ...

thank

+5
source share
5 answers

In a high-level language view, you will not see optimization. The increase in speed comes from what the compiler does with what you have.

In the first case, it is something like:

LOCATION_FLAG;
DO_SOMETHING;
TEST FOR LOOP COMPLETION;//Jumps to LOCATION_FLAG if false

In the second, it is something like:

LOCATION_FLAG;
DO_SOMETHING;
DO_SOMETHING;
DO_SOMETHING;
TEST FOR LOOP COMPLETION;//Jumps to LOCATION_FLAG if false

, 1 3. 1 1; .

, , ( mod 3, ), , .

+9

Loop , , . , .

+4

, "" "" A, B C, n , .

, , ( i n) . 3 .

+3

A(), B() C() , verion .

, . , , , , .

+2

, "" , , , . , - . , .

, , , , ..

. gcc "-floop-optimize", "-O, -O2, -O3 -Os"

EDIT , "-ununroll-loops".

0

All Articles