Why does a local var reference cause a big performance degradation?

Consider the following simple program:

using System; using System.Diagnostics; class Program { private static void Main(string[] args) { const int size = 10000000; var array = new string[size]; var str = new string('a', 100); var sw = Stopwatch.StartNew(); for (int i = 0; i < size; i++) { var str2 = new string('a', 100); //array[i] = str2; // This is slow array[i] = str; // This is fast } sw.Stop(); Console.WriteLine("Took " + sw.ElapsedMilliseconds + "ms."); } } 

If I run this, it will be relatively fast. If I uncomment the “slow” line and comment out the “fast” line, it will be more than 5 times slower. Note that in both situations, it initializes the string "str2" inside the loop. This is not optimized in any way (this can be verified by looking at IL or disassembly).

In both cases, the code seems to work with the same amount of work. It needs to select / initialize the string, and then assign a reference to the location of the array. The only difference is whether this link is local var "str" ​​or "str2".

Why does he make such a big difference in performance by assigning a reference to "str" ​​or "str2"?

If we look at the disassembly, there is a difference:

 (fast) var str2 = new string('a', 100); 0000008e mov r8d,64h 00000094 mov dx,61h 00000098 xor ecx,ecx 0000009a call 000000005E393928 0000009f mov qword ptr [rsp+58h],rax 000000a4 nop (slow) var str2 = new string('a', 100); 00000085 mov r8d,64h 0000008b mov dx,61h 0000008f xor ecx,ecx 00000091 call 000000005E383838 00000096 mov qword ptr [rsp+58h],rax 0000009b mov rax,qword ptr [rsp+58h] 000000a0 mov qword ptr [rsp+38h],rax 

In the "slow" version there are two additional operations "mov", where the "fast" version has only "nop".

Can someone explain what is going on here? It is hard to understand how two additional mov operations can slow down> 5x, especially since I expect that most of the time should be spent on initializing the string. Thanks for any ideas.

+51
performance c #
May 09 '16 at 16:29
source share
2 answers

You are correct that the code does about the same amount of work anyway.

But the garbage collector does very different things in both cases.

In the str version, no more than two instances of a string are live at a given time. This means that almost all new objects in generation 0 are dying, nothing needs to be promoted to generation 1. Since generation 1 does not grow at all, the GC has no reason to attempt expensive “complete collections”.

In str2 all new line instances are alive. Objects receive higher generations (which may include moving them in memory). In addition, as higher generations are currently growing, GC sometimes tries to run complete collections.

Note that .NET GC tends to have a lifetime that linearly depends on the number of live objects: living objects must be moved and moved to the side, while dead objects cost nothing at all (they are simply overwritten the next time allocated memory).

This means that str is the best option for a garbage collector; and str2 is the worst case scenario.

Take a look at the GC performance counters for your program, I suspect you will see very different results between programs.

+76
May 9 '16 at 16:55
source share
— -

No, the local link does not slow down.

What happens slowly, a lot of new string instances are created that are classes. Although the fast version uses the same instance. It can also be optimized, while a constructor call cannot be.

+1
May 9 '16 at 16:38
source share



All Articles