C # serializes a large array to disk

I have a very large graph stored in a one-dimensional array (about 1.1 GB), which I can store in memory on my machine running Windows XP with 2 GB of RAM and 2 GB of virtual memory. I can generate the entire data set in memory, however, when I try to serialize it to disk using BinaryFormatter , the file size reaches about 50 MB and then gives me an exception from memory. The code that I use for writing is the same that I use among all my lesser problems:

 StateInformation[] diskReady = GenerateStateGraph(); BinaryFormatter bf = new BinaryFormatter(); using (Stream file = File.OpenWrite(@"C:\temp\states.dat")) { bf.Serialize(file, diskReady); } 

The search algorithm is very light, and I can perform a search on this graph without any problems when it is in memory.

I really have 3 questions:

  • Is there a more reliable way to write a large data set to disk. I guess you can define large, as when the size of the approaches to the data set the amount of available memory, although I'm not sure how accurate that is.

  • Should I switch to another database oriented approach?

  • Can anyone point me to the literature on reading parts of a large data set from a disk file in C #?

+3
source share
3 answers

Write entries to the file yourself. One simple solution would look like this:

 StateInformation[] diskReady = GenerateStateGraph(); BinaryFormatter bf = new BinaryFormatter(); using (Stream file = File.OpenWrite(@"C:\temp\states.dat")) { foreach(StateInformation si in diskReady) using(MemoryStream ms = new MemoryStream()) { bf.Serialize(ms, diskReady); byte[] ser = ms.ToArray(); int len = ser.Length; file.WriteByte((byte) len & 0x000000FF); file.WriteByte((byte) (len & 0x0000FF00) >> 8); file.WriteByte((byte) (len & 0x00FF0000) >> 16); file.WriteByte((byte) (len & 0x7F000000) >> 24); file.Write(ser, 0, len); } } 

At the same time, no more memory is required for one memory of StateInformation objects, and for deserialization you read four bytes, create a length, create a buffer of this size, fill it and deserialize.

All of the above can be seriously optimized for speed, memory usage and disk size if you create a more specialized format, but the principle is given above.

+1
source

My experience with large datasets like this is to manually write it to disk, rather than using inline serialization.

It may not be pratical depending on how complex the StateInformation class is , but if it's quite simple, you can manually write / read binary data using BinaryReader and BinaryWriter . This will allow you to read or write most types of values ​​directly to the stream, in the expected predefined order defined by your code.

This parameter should allow you to quickly read and write your data, although it is inconvenient if you want to add information to StateInformation later or take it out, since you will have to manage the updating of files.

+1
source

What is contained in StateInformation? Is this a class? structure?

If you are just worried about the easy-to-use container format that easily serializes to disk β€” a typed DataSet created, save the information in a DataSet and then use the WriteXml () method in the DataSet to save it to disk. You can then create an empty DataSet, and then use ReadXml () to load the contents into memory.

If StateInformation is in a structure with type values, you can look at MemoryMappedFile to save / use the contents of the array, directly referring to the file, treating it as memory. This approach is rather more complicated than DataSet, but has its own set of advantages.

0
source

All Articles