This is functionally identical to the code below:
for(int i=0;i<n;i++) { *to++=*from++; }
The difference is that your code expands the loop , so for every 8 whole copies only 1 loop iteration is required. Since there are no gaps for any of the cases , execution is performed from each case label to the next.
When count% 8 == 0, 8 copies are executed inside the loop
when count% 8 == 7, 7 copies are executed for the first iteration
etc. After the first iteration with% 8 copies, exactly 8 copies occur at each iteration.
By deploying the cycle in this way, the cycle overhead is significantly reduced. It is important to note the order of case values (0,7,6,5,4,3,2,1), which can be translated into the transition table by the compiler.
Update
The problem with the example code sent by the OP is that a count of 0 will result in 8 copies, which could lead to a buffer overflow.
David Lively Nov 12 '09 at 16:03 2009-11-12 16:03
source share