Why is System / mscorlib code so much faster? Especially for loops?

This is just a personal project that I burst into. Basically, I am parsing a text file (say, from 20 mb to 1 gb) using StreamReader. The performance is pretty solid, but still ... I was itching to find out what would happen if I analyzed it in binary format. Do not get me wrong, I will not prematurely optimize. I purposefully micro-optimize just "see."

So, I am reading a text file using byte arrays. Come to find out if new lines can be (Windows) standard CR / LF or CR or LF ... pretty dirty. I was hoping I could use Array.IndexOf on CR and then skip LF. Instead, I find that I am writing code very similar to IndexOf, but checking it and returning the array as needed.

So, the bottom line: using very similar code for IndexOf, my code still ends insanely slower. Simply put, using the 800mb file:

  • Using IndexOf and searching for CR: ~ 320 mb / s
  • Using StreamReader and ReadLine: ~ 180 mb / s
  • for replication of the IndexOf cycle: ~ 150 mb / s

here is the code with a for loop (~ 150 mb / s):

IEnumerator<byte[]> IEnumerable<byte[]>.GetEnumerator() { using(FileStream fs = new FileStream(_path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite, _bufferSize)) { byte[] buffer = new byte[_bufferSize]; int bytesRead; int overflowCount = 0; while((bytesRead = fs.Read(buffer, overflowCount, buffer.Length - overflowCount)) > 0) { int bufferLength = bytesRead + overflowCount; int lastPos = 0; for(int i = 0; i < bufferLength; i++) { if(buffer[i] == 13 || buffer[i] == 10) { int length = i - lastPos; if(length > 0) { byte[] line = new byte[length]; Array.Copy(buffer, lastPos, line, 0, length); yield return line; } lastPos = i + 1; } } if(lastPos > 0) { overflowCount = bufferLength - lastPos; Array.Copy(buffer, lastPos, buffer, 0, overflowCount); } } } } 

this is a faster code block (~ 320 Mb / s):

 while((bytesRead = fs.Read(buffer, overflowCount, buffer.Length - overflowCount)) > 0) { int bufferLength = bytesRead + overflowCount; int pos = 0; int lastPos = 0; while(pos < bufferLength && (pos = Array.IndexOf<byte>(buffer, 13, pos)) != -1) { int length = pos - lastPos; if(length > 0) { byte[] line = new byte[length]; Array.Copy(buffer, lastPos, line, 0, length); yield return line; } if(pos < bufferLength - 1 && buffer[pos + 1] == 10) pos++; lastPos = ++pos; } if(lastPos > 0) { overflowCount = bufferLength - lastPos; Array.Copy(buffer, lastPos, buffer, 0, overflowCount); } } 

(No, this is not ready for production, some cases will make it explode, I use a 128 kb buffer to ignore most of them.)

So my big question is: why is Array.IndexOf so much faster? This is essentially the same for a loop going through an array. Is there anything about the way mscorlib code is executed? Even modifying the above code to really replicate IndexOf and search only for CR, and then skip LF, as if I used IndexOf, it will not help. Errr ... I am undergoing various permutations, and it's late enough, maybe there is some blatant mistake that I miss?

By the way, I looked in ReadLine and noticed that it uses a switch block, not an if block ... when I do something like that, strange, it increases performance by about 15 Mb / s. This is another question another time (why is the switch faster than if?), But I decided that I would indicate that I really looked at it.

In addition, I am testing release builds outside of VS, so no debugging occurs.

+4
source share
2 answers

That's a good question. The short version is that it all boils down to implementing IEqualityComparer, which IndexOf will use. Let's see the following code snippet:

 using System; using System.Collections.Generic; using System.Diagnostics; class Program { static int [] buffer = new int [1024]; const byte mark = 42; const int iterations = 10000; static void Main () { buffer [buffer.Length -1] = mark; Console.WriteLine (EqualityComparer<int>.Default.GetType ()); Console.WriteLine ("Custom: {0}", Time (CustomIndexOf)); Console.WriteLine ("Builtin: {0}", Time (ArrayIndexOf)); } static TimeSpan Time (Action action) { var watch = new Stopwatch (); watch.Start (); for (int i = 0; i < iterations; i++) action (); watch.Stop (); return watch.Elapsed; } static void CustomIndexOf () { for (int i = 0; i < buffer.Length; i++) if (buffer [i] == mark) break; } static void ArrayIndexOf () { Array.IndexOf (buffer, mark); } } 

You need to compile it with csc / optimize +.

Here is the result:

 C:\Tmp>test System.Collections.Generic.GenericEqualityComparer`1[System.Int32] Custom: 00:00:00.0386403 Builtin: 00:00:00.0427903 

Now change the array type and EqualityComparer to byte, and here is the result:

 C:\Tmp>test System.Collections.Generic.ByteEqualityComparer Custom: 00:00:00.0387158 Builtin: 00:00:00.0165881 

As you can see, the byte array is special, which is probably optimized for finding a byte in the byte array. Since I cannot decompile the .net structure, I stopped the analysis here, but I think this is a pretty good hint.

+2
source

The mscorlib files are ngen'd during installation. Try ngen'ing your file using the Ngen.exe utility (provided with the .NET framework, I suppose) ... and then check the benchmarks. Maybe a little faster.

For your .NET code to work at close range speeds, Microsoft recommends that you use Ngen to use your code during the installation of your application ...

+2
source

All Articles