What does this C code [Duff device] do?

void Send(int * to, const int* from, const int count) { int n = (count+7) / 8; switch(count%8) { case 0: do { *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; } while (--n>0); } } 
+10
c ++
Nov 12 '09 at 15:56
source share
6 answers

This is a Duff Device for copying memory buffers.

+17
Nov 12 '09 at 15:59
source share

This is a Duff Device . This is a loop unfolding method that eliminates the need to add a secondary correction loop to deal with the time when the number of iterations of the loop is not known, to be an exact multiple of the unrolling coefficient.

Since most of the answers here seem to be generally positive about this, I'm going to oppose it.

With this code, the compiler will try to apply any optimization to the body of the loop. If you just wrote the code as a simple loop, a modern compiler should be able to handle the spread for you. Thus, you maintain readability and performance and hope that other optimization methods will be applied to the body of the loop.

The Wikipedia article referenced by others even says that when this “template” was removed from the performance of the Xfree86 source code, it actually improved.

This result is typical for blind manual optimization of any code that you think might be needed. This prevents the compiler from doing its job correctly, making your code less readable and more error prone and usually slows it down. If you are doing everything right, first of all, that is, writing simple code, then profiling for bottlenecks, and then optimizing, you would not even think of using something like this. In any case, not with a modern processor and compiler.

It is great to understand this, but I would be surprised if you ever used it.

+15
Jul 12 2018-12-12T00:
source share

This mix of switch statement and while loop is called a “Duff Device”. This is a loop unrolling method that was an optimization often used in earlier times.

Thus, this code still copies the contents of memory from one place to another, but may be more efficient. Beware that on today's architectures you should always measure this, because with cache locality and a dazzlingly fast loop reversal cycle, it is often a bad idea.

+7
Nov 12 '09 at 16:01
source share

This is functionally identical to the code below:

 for(int i=0;i<n;i++) { *to++=*from++; } 

The difference is that your code expands the loop , so for every 8 whole copies only 1 loop iteration is required. Since there are no gaps for any of the cases , execution is performed from each case label to the next.

When count% 8 == 0, 8 copies are executed inside the loop

when count% 8 == 7, 7 copies are executed for the first iteration

etc. After the first iteration with% 8 copies, exactly 8 copies occur at each iteration.

By deploying the cycle in this way, the cycle overhead is significantly reduced. It is important to note the order of case values ​​(0,7,6,5,4,3,2,1), which can be translated into the transition table by the compiler.

Update

The problem with the example code sent by the OP is that a count of 0 will result in 8 copies, which could lead to a buffer overflow.

+5
Nov 12 '09 at 16:03
source share

Duff device

In computer science , the Duff device is an implementation of a serial copy that uses a technique widely used in assembly language for loop unwinding . Its discovery is attributed to Tom Duff in November 1983, who at that time worked for Lucasfilm . This is perhaps the most dramatic use of case label pass-through in the C programming language to date. Duff does not pretend to be a loan to open the concept of unfolding a cycle , it's just a specific expression in C.

+4
Nov 12 '09 at 16:01
source share

Read about Duff Device

+3
Nov 12 '09 at 15:59
source share



All Articles