When writing a simple library for parsing game data files, I noticed that reading the entire data file into memory and parsing there was much faster (up to 15x, 106 with v 7).
The analysis is usually performed sequentially, but queries will be run from time to time to read some data stored elsewhere in the file associated with the offset.
I understand that analysis from memory will certainly be faster, but something is wrong if the difference is so significant. I wrote code to simulate this:
public static void Main(string[] args) { Stopwatch n = new Stopwatch(); n.Start(); byte[] b = File.ReadAllBytes(@"D:\Path\To\Large\File"); using (MemoryStream s = new MemoryStream(b, false)) RandomRead(s); n.Stop(); Console.WriteLine("Memory read done in {0}.", n.Elapsed); b = null; n.Reset(); n.Start(); using (FileStream s = File.Open(@"D:\Path\To\Large\File", FileMode.Open)) RandomRead(s); n.Stop(); Console.WriteLine("File read done in {0}.", n.Elapsed); Console.ReadLine(); } private static void RandomRead(Stream s) {
As input I used one of the game data files. This file was about 102 MB, and it produced this result ( Memory read done in 00:00:03.3092618. File read done in 00:00:32.6495245.
), Memory read done in 00:00:03.3092618. File read done in 00:00:32.6495245.
memory reads about 11 times faster than the file.
Read memory was performed before the file was read in order to try to improve speed through the file cache. It is still much slower.
I tried to increase or decrease the size of the FileStream
buffer; nothing yielded significantly better results, and increasing or decreasing too much only worsened speed.
Is there something I'm doing wrong, or is this to be expected? Is there any way to at least make the slowdown less significant?
Why is the entire file read at once, and then it is analyzed much faster than reading and parsing at the same time?
I really compared with a similar library written in C ++ that uses the native CreateFileMapping
and MapViewOfFile
Windows to read files, and it is very fast. Could this be a constant switch from managed to unmanaged and involved marshaling that causes this?
I also tried MemoryMappedFile
in .NET 4; the gain was only one second.