Foreach vs for: please explain the difference in build code

I recently tested the performance of the for loop and foreach loop in C #, and I noticed that to summarize the ints array in long, the foreach loop may appear faster. Here is the full test program , I used Visual Studio 2012, x86, release mode, optimization is on.

Here is the build code for both loops. Preview:

long sum = 0; 00000000 push ebp 00000001 mov ebp,esp 00000003 push edi 00000004 push esi 00000005 push ebx 00000006 xor ebx,ebx 00000008 xor edi,edi foreach (var i in collection) { 0000000a xor esi,esi 0000000c cmp dword ptr [ecx+4],0 00000010 jle 00000025 00000012 mov eax,dword ptr [ecx+esi*4+8] sum += i; 00000016 mov edx,eax 00000018 sar edx,1Fh 0000001b add ebx,eax 0000001d adc edi,edx 0000001f inc esi foreach (var i in collection) { 00000020 cmp dword ptr [ecx+4],esi 00000023 jg 00000012 } return sum; 00000025 mov eax,ebx 00000027 mov edx,edi 00000029 pop ebx 0000002a pop esi 0000002b pop edi 0000002c pop ebp 0000002d ret 

And for:

  long sum = 0; 00000000 push ebp 00000001 mov ebp,esp 00000003 push edi 00000004 push esi 00000005 push ebx 00000006 push eax 00000007 xor ebx,ebx 00000009 xor edi,edi for (int i = 0; i < collection.Length; ++i) { 0000000b xor esi,esi 0000000d mov eax,dword ptr [ecx+4] 00000010 mov dword ptr [ebp-10h],eax 00000013 test eax,eax 00000015 jle 0000002A sum += collection[i]; 00000017 mov eax,dword ptr [ecx+esi*4+8] 0000001b cdq 0000001c add eax,ebx 0000001e adc edx,edi 00000020 mov ebx,eax 00000022 mov edi,edx for (int i = 0; i < collection.Length; ++i) { 00000024 inc esi 00000025 cmp dword ptr [ebp-10h],esi 00000028 jg 00000017 } return sum; 0000002a mov eax,ebx 0000002c mov edx,edi 0000002e pop ecx 0000002f pop ebx 00000030 pop esi 00000031 pop edi 00000032 pop ebp 00000033 ret 

As you can see, the main loop is 7 instructions for "foreach" and 9 instructions for "for". This results in an approximately 10% performance difference in my tests.

I do not read assembly code very well, but I do not understand why a for loop would not be as efficient as foreach. What's going on here?

+8
performance assembly c # x86
source share
3 answers

Since the array is so large, the only relevance and part is obviously one inside the loop, this one:

 // for loop 00000017 mov eax,dword ptr [ecx+esi*4+8] 0000001b cdq 0000001c add eax,ebx 0000001e adc edx,edi 00000020 mov ebx,eax 00000022 mov edi,edx // foreach loop 00000012 mov eax,dword ptr [ecx+esi*4+8] 00000016 mov edx,eax 00000018 sar edx,1Fh 0000001b add ebx,eax 0000001d adc edi,edx 

Since the sum is a long int, it is stored in two different registers, namely ebx contains the least significant four bytes, and edi contains the most important four. They differ in that the collection [i] is (implicitly) cast from int to long:

 // for loop 0000001b cdq // foreach loop 00000016 mov edx,eax 00000018 sar edx,1Fh 

Another important thing to note is that the for-loop version has the sum in the β€œreverse order”:

 long temp = (long) collection[i]; // implicit cast, stored in edx:eax temp += sum; // instead of "simply" sum += temp sum = temp; // sum is stored back into ebx:edi 

I can’t say why the compiler prefers this method instead of sum + = temp (maybe @EricLippert can tell us :)), but I suspect that it is associated with some command dependency problems that may occur.

+8
source share

OK, so here is an annotated version of the assembly code, as you will see that the instruction in the loop is very close.

  foreach (var i in collection) { 0000000a xor esi,esi clear index 0000000c cmp dword ptr [ecx+4],0 get size of collection 00000010 jle 00000025 exit if empty 00000012 mov eax,dword ptr [ecx+esi*4+8] get item from collection sum += i; 00000016 mov edx,eax move to edx:eax 00000018 sar edx,1Fh shift 31 bits to keep sign only 0000001b add ebx,eax add to sum 0000001d adc edi,edx add with carry from previous add 0000001f inc esi increment index foreach (var i in collection) { 00000020 cmp dword ptr [ecx+4],esi compare size to index 00000023 jg 00000012 loop if more } return sum; 00000025 mov eax,ebx result was in ebx ================================================= for (int i = 0; i < collection.Length; ++i) { 0000000b xor esi,esi clear index 0000000d mov eax,dword ptr [ecx+4] get limit on for 00000010 mov dword ptr [ebp-10h],eax save limit 00000013 test eax,eax test if limit is empty 00000015 jle 0000002A exit loop if empty sum += collection[i]; 00000017 mov eax,dword ptr [ecx+esi*4+8] get item form collection 0000001b cdq convert eax to edx:eax 0000001c add eax,ebx add to sum 0000001e adc edx,edi add with carry from previous add 00000020 mov ebx,eax put result in edi:ebx 00000022 mov edi,edx for (int i = 0; i < collection.Length; ++i) { 00000024 inc esi increment index 00000025 cmp dword ptr [ebp-10h],esi compare to limit 00000028 jg 00000017 loop if more } return sum; 0000002a mov eax,ebx result was in ebx 
+5
source share

According to C # Language Specification 4.0, the foreach split into a compiler as follows:

Eogeasp operator:

foreach (identifier of a local variable in the expression) embedded-statement

 { E e = ((C)(x)).GetEnumerator(); try { V v; while (e.MoveNext()) { v = (V)(T)e.Current; embedded-statement } } finally { … // Dispose e } } 

This is after the following processing (again from the specification):

β€’ If the type X of the expression is an array type, then there is an implicit reference conversion from X to the System.Collections.IEnumerable interface (since System.Array implements this interface). The collection type is the System.Collections.IEnumerable interface, the enumerator type is the System.Collections.IEnumerator interface, and the element type is the element type of an array of type X.

This is probably a good reason why you don't see the same build code from the compiler.

-one
source share

All Articles