Why is the compiler vs C ++ 2010 creating different assembler code for a similar function

So recently I was thinking about strcpy and back to K & R, where they show the implementation as

while (*dst++ = *src++) ; 

However, I mistakenly rewrote it as:

 while (*dst = *src) { src++; //technically could be ++src on these lines dst++; } 

In any case, when I thought about whether the compiler would actually create different code for the two. My initial thought is that they should be approximately the same, as src and dst increase but are never used. I thought that the compiler would know that it is not trying to save them as β€œvariables” in the generated machine code.

Using Windows7 with VS 2010 C ++ SP1 in 32-bit Release (/ O2) mode, I got a disassembler code for both of the above incarnations. So that the function itself could not directly refer to the input and be inline, I made a dll with each of the functions. I missed the prologue and epilogue created by AFM.

  while (*dst++ = *src++) 6EBB1003 8B 55 08 mov edx,dword ptr [src] 6EBB1006 8B 45 0C mov eax,dword ptr [dst] 6EBB1009 2B D0 sub edx,eax //prepare edx so that edx + eax always points to src 6EBB100B EB 03 jmp docopy+10h (6EBB1010h) 6EBB100D 8D 49 00 lea ecx,[ecx] //looks like align padding, never hit this line 6EBB1010 8A 0C 02 mov cl,byte ptr [edx+eax] //ptr [edx+ eax] points to char in src :loop begin 6EBB1013 88 08 mov byte ptr [eax],cl //copy char to dst 6EBB1015 40 inc eax //inc src ptr 6EBB1016 84 C9 test cl,cl // check for 0 (null terminator) 6EBB1018 75 F6 jne docopy+10h (6EBB1010h) //if not goto :loop begin ; 

Above, I annotated the code, essentially one loop, only 1 checks for zero and 1 copy of memory.

Now look at my version of the error:

  while (*dst = *src) 6EBB1003 8B 55 08 mov edx,dword ptr [src] 6EBB1006 8A 0A mov cl,byte ptr [edx] 6EBB1008 8B 45 0C mov eax,dword ptr [dst] 6EBB100B 88 08 mov byte ptr [eax],cl //copy 0th char to dst 6EBB100D 84 C9 test cl,cl //check for 0 6EBB100F 74 0D je docopy+1Eh (6EBB101Eh) // return if we encounter null terminator 6EBB1011 2B D0 sub edx,eax 6EBB1013 8A 4C 02 01 mov cl,byte ptr [edx+eax+1] //get +1th char :loop begin { src++; dst++; 6EBB1017 40 inc eax 6EBB1018 88 08 mov byte ptr [eax],cl //copy above char to dst 6EBB101A 84 C9 test cl,cl //check for 0 6EBB101C 75 F5 jne docopy+13h (6EBB1013h) // if not goto :loop begin } 

In my version, I see that it first copies the 0th char to the destination, then checks the null value and then finally enters the loop where it checks the null value again. So the loop remains basically the same, but now it processes the 0th character before the loop. This, of course, will be suboptimal compared to the first case.

I am wondering if anyone knows why the compiler fails to make the same (or almost the same) code as the first example. Is this a ms compiler problem or maybe with my compiler / linker settings?


here is the complete code, 2 files (1 function replaces another).

 // in first dll project __declspec(dllexport) void docopy(const char* src, char* dst) { while (*dst++ = *src++); } __declspec(dllexport) void docopy(const char* src, char* dst) { while (*dst = *src) { ++src; ++dst; } } //seprate main.cpp file calls docopy void docopy(const char* src, char* dst); char* source ="source"; char destination[100]; int main() { docopy(source, destination); } 
+7
source share
3 answers

Because in the first example, post-incrementation always occurs, even if src begins to point to a null character. In the same initial situation, the second example will not increment pointers.

+10
source

Of course, the compiler has other options. "Copy the first byte then enter a loop if not 0" is what gcc-4.5.1 produces with -O1. With -O2 and -O3 it produces

 .LFB0: .cfi_startproc jmp .L6 // jump to copy .p2align 4,,10 .p2align 3 .L4: addq $1, %rdi // increment pointers addq $1, %rsi .L6: // copy movzbl (%rdi), %eax // get source byte testb %al, %al // check for 0 movb %al, (%rsi) // move to dest jne .L4 // loop if nonzero rep ret .cfi_endproc 

which is very similar to what it produces for the K & R cycle. Whether this is actually better, I can’t say, but it looks better.

In addition to going into a loop, the instructions for the K & R cycle are exactly the same, just ordered in different ways:

 .LFB0: .cfi_startproc .p2align 4,,10 .p2align 3 .L2: movzbl (%rdi), %eax // get source byte addq $1, %rdi // increment source pointer movb %al, (%rsi) // move byte to dest addq $1, %rsi // increment dest pointer testb %al, %al // check for 0 jne .L2 // loop if nonzero rep ret .cfi_endproc 
+2
source

The second code does not "check null again." In your second version, the loop body works with characters in the address edx+eax+1 (pay attention to the +1 part), which will be characters 1, 2, 3, and so on. The prolog code works with character number 0. This means that the code never checks the same character twice as you seem to believe. There is no "again."

The second code is a more confusing bot (the first iteration of the loop is actually derived from it), because, as has already been explained, its functionality is different. The final values ​​of the pointers differ between your fist and your second version.

0
source

All Articles