Optimizing the boundaries of array boundaries in a loop

var ar = new int[500000000]; var sw = new Stopwatch(); sw.Start(); var length = ar.Length; for (var i = 0; i < length; i++) { if (ar[i] == 0); } sw.Stop(); 

sw.ElapsedMilliseconds: ~ 2930ms

  var ar = new int[500000000]; var sw = new Stopwatch(); sw.Start(); for (var i = 0; i < ar.Length; i++) { if (ar[i] == 0); } sw.Stop(); 

sw.ElapsedMilliseconds: ~ 3520 ms

Win8x64, VS12, .NET4.5, release version, "Code Optimization".

As far as I know, the second approach should be faster due to the optimization of checking the boundaries of the array. Did I miss something?

+4
source share
4 answers

I also use Win8 x64, .NET 4.5, Release build, outside the debugger (this is important); I get:

 0: 813ms vs 421ms 1: 439ms vs 420ms 2: 440ms vs 420ms 3: 431ms vs 429ms 4: 433ms vs 427ms 5: 424ms vs 437ms 6: 427ms vs 434ms 7: 430ms vs 432ms 8: 432ms vs 435ms 9: 430ms vs 430ms 10: 427ms vs 418ms 11: 422ms vs 421ms 12: 434ms vs 420ms 13: 439ms vs 425ms 14: 426ms vs 429ms 15: 426ms vs 426ms 16: 417ms vs 432ms 17: 442ms vs 425ms 18: 420ms vs 429ms 19: 420ms vs 422ms 

The first one pays the cost of JIT / "fusion", but in general it is about the same (some in each column look faster, but overall they don’t talk much).

 using System; using System.Diagnostics; static class Program { static void Main() { var ar = new int[500000000]; for (int j = 0; j < 20; j++) { var sw = Stopwatch.StartNew(); var length = ar.Length; for (var i = 0; i < length; i++) { if (ar[i] == 0) ; } sw.Stop(); long hoisted = sw.ElapsedMilliseconds; sw = Stopwatch.StartNew(); for (var i = 0; i < ar.Length; i++) { if (ar[i] == 0) ; } sw.Stop(); long direct = sw.ElapsedMilliseconds; Console.WriteLine("{0}: {1}ms vs {2}ms", j, hoisted, direct); } } } 
+6
source

I researched this a little more and found it really difficult to do a test that actually shows the effect of optimizing the exclusion of border checks.

First, some problems with the old standard:

  • Dismantling showed that the JIT compiler was able to optimize the first version. It was a surprise to me, but the showdown does not lie. This, of course, completely strikes the purpose of this test. Correction: accepts length as an argument to a function.
  • The array is too large, which means there are no misses in the cache that add a lot of noise to our signal. Fix: use a short array, but loop through it several times.

But now the real problem: does something too smart . In the inner loop there is no test of bounds of boundaries, even if the length of the loop comes from the argument of the function. The generated code is different, but the inner loop is essentially the same. Not completely (different registers, etc.), but it follows the same pattern:

 _loop: mov eax, [somewhere + index] add index, 4 cmp index, end jl _loop 

There is no significant difference in runtime, since a significant difference in the part of the generated code is not significant.

+4
source

I think the answer is that the garbage collector works and changes your timings.

Disclaimer : I do not see the whole context of the OP code because you did not publish the compiled example; I assume that you are redistributing the array and not reusing it. If not, then this is not the right answer!

Consider this code:

 using System; using System.Diagnostics; namespace Demo { internal class Program { private static void Main(string[] args) { var ar = new int[500000000]; test1(ar); //ar = new int[500000000]; // Uncomment this line. test2(ar); } private static void test1(int[] ar) { var sw = new Stopwatch(); sw.Start(); var length = ar.Length; for (var i = 0; i < length; i++) { if (ar[i] == 0); } sw.Stop(); Console.WriteLine("test1 took " + sw.Elapsed); } private static void test2(int[] ar) { var sw = new Stopwatch(); sw.Start(); for (var i = 0; i < ar.Length; i++) { if (ar[i] == 0); } sw.Stop(); Console.WriteLine("test2 took " + sw.Elapsed); } } } 

On my system, it prints:

 test1 took 00:00:00.6643788 test2 took 00:00:00.3516378 

If I uncommented the line marked // Uncomment this line. , then the timings change to:

 test1 took 00:00:00.6615819 test2 took 00:00:00.6806489 

This is because GC collects the previous array.

[EDIT] To avoid the cost of running JIT, I put the whole test in a loop:

 for (int i = 0; i < 8; ++i) { test1(ar); ar = new int[500000000]; // Uncomment this line. test2(ar); } 

And then my results with the allocation of the second array:

 test1 took 00:00:00.6437912 test2 took 00:00:00.3534027 test1 took 00:00:00.3401437 test2 took 00:00:00.3486296 test1 took 00:00:00.3470775 test2 took 00:00:00.3675475 test1 took 00:00:00.3501221 test2 took 00:00:00.3549338 test1 took 00:00:00.3427057 test2 took 00:00:00.3574063 test1 took 00:00:00.3566458 test2 took 00:00:00.3462722 test1 took 00:00:00.3430952 test2 took 00:00:00.3464017 test1 took 00:00:00.3449196 test2 took 00:00:00.3438316 

And with the second array distribution turned on:

 test1 took 00:00:00.6572665 test2 took 00:00:00.6565778 test1 took 00:00:00.3576911 test2 took 00:00:00.6910897 test1 took 00:00:00.3464013 test2 took 00:00:00.6638542 test1 took 00:00:00.3548638 test2 took 00:00:00.6897472 test1 took 00:00:00.4464020 test2 took 00:00:00.7739877 test1 took 00:00:00.3835624 test2 took 00:00:00.8432918 test1 took 00:00:00.3496910 test2 took 00:00:00.6471341 test1 took 00:00:00.3486505 test2 took 00:00:00.6527160 

Note that test2 sequentially takes longer due to the GC.

Unfortunately, GC makes the synchronization results pretty pointless.

For example, if I change the test code to this:

 for (int i = 0; i < 8; ++i) { var ar = new int[500000000]; GC.Collect(); test1(ar); //ar = new int[500000000]; // Uncomment this line. test2(ar); } 

With a commented line, I get:

 test1 took 00:00:00.6354278 test2 took 00:00:00.3464486 test1 took 00:00:00.6672933 test2 took 00:00:00.3413958 test1 took 00:00:00.6724916 test2 took 00:00:00.3530412 test1 took 00:00:00.6606178 test2 took 00:00:00.3413083 test1 took 00:00:00.6439316 test2 took 00:00:00.3404499 test1 took 00:00:00.6559153 test2 took 00:00:00.3413563 test1 took 00:00:00.6955377 test2 took 00:00:00.3364670 test1 took 00:00:00.6580798 test2 took 00:00:00.3378203 

And without it:

 test1 took 00:00:00.6340203 test2 took 00:00:00.6276153 test1 took 00:00:00.6813719 test2 took 00:00:00.6264782 test1 took 00:00:00.6927222 test2 took 00:00:00.6269447 test1 took 00:00:00.7010559 test2 took 00:00:00.6262000 test1 took 00:00:00.6975080 test2 took 00:00:00.6457846 test1 took 00:00:00.6796235 test2 took 00:00:00.6341214 test1 took 00:00:00.6823508 test2 took 00:00:00.6455403 test1 took 00:00:00.6856985 test2 took 00:00:00.6430923 

I think the moral of this test is: GC for this particular test is such a big overhead compared to the rest of the code that it completely distorts the synchronization results and cannot be trusted to mean anything.

+1
source

you call the property on the second so that it is slower ar.Length

0
source

All Articles