C # /. NET - Custom Binary File Formats - Where to Start?

I need to be able to store some data in a user binary file format. I have never developed my own format before. This should be a friendly format for traveling between the worlds of C #, Java and Ruby / Perl / Python.

To start, the file will consist of entries. GUID field and JSON / YAML / XML package field. I'm not sure what to use as delimiters. Items with a comma, tab, or new line seem too fragile. What does Excel do? or OpenOffice pre-XML formats? If you use ASCII characters 0 or 1. Do not know where to start. Any articles or books on the topic?

This file format can be expanded later to include a header section.

Note. To start, I will work in .NET, but I would like the format to be easily portable.

UPDATE:
Processing "packages" can be slow, but navigation in file format cannot. Therefore, I believe that XML is not working.

+4
source share
5 answers

I will try to add general hints to create a portable binary file format.

Note that inventing a binary file format means documenting how the bits should go in it and what they mean. This is not coding, but documentation.

Now the hints:

  • Decide what to do with endianess . A good and easy way is to solve it once and for all. To save conversions (performance), the choice would be preferably minimal if it is used on a shared PC (i.e. x86).

  • Create a title . Yes, itโ€™s a good idea to always have a headline. The first bytes of the file should be able to tell you what format you are messing with.

    • Start with magic to be able to recognize your format (ASCII line will do the trick)
    • Add version. The version of your file format will not hurt to add, and this will allow you to perform backward compatibility later.
  • Finally, add the data. The data format will now be specific, and it will always be based on your specific needs. Basically, the data will be stored in a binary image of some data structure. A data structure is what you need to come up with.

If you need random access to your data for some indexes, B-Trees is the way to go, and if you just need a lot of numbers to write them all, then read them all the "array" will do the trick.

In addition, you can use the concept of TLV (Type-Length-Value) for advanced compatibility.

+1
source

What about searching for "protocol buffers"? Designed as an efficient, version-portable universal binary format, does it give you C ++, Java and Python in the google library and C #, Perl, Ruby and others in the port communities ?

Note that Guid does not have a specific data type, but you can pin it as a message with (essentially) a byte[] .

Typically for .NET to work, I would recommend protobuf-net (but, as the author, m is somewhat biased) - however, if you intend to use other languages โ€‹โ€‹later, you can do better (in the long run) with Jon dotnet-protobufs ; this will give you a familiar API across platforms (somewhere, since protobuf-net uses .NET idioms).

+7
source

ASCII characters 0 or 1 occupy several bits (like any other character), so if you store it like this, your binary will be several times larger than it should be. In a text file, zeros and ones is not a double file :)

You can use BinaryWriter to write raw data directly to. The only part you need to find out is to convert your format in memory (usually some kind of object graph) into a sequence of bytes that BinaryWriter can consume.

However, if your main concern is portability, I recommend in binary format in general. XML is precisely designed to address portability and interoperability. This is a detailed and powerful file format, but it is a compromise that you make to solve these problems. If reading requires a human-readable format, Marc's answer is the way to go. No need to reinvent the wheel of portability!

+2
source

It depends on what type of data you will write to the binary file and what the purpose of the binary file is. Are they class objects or just write data? If this is record data, I would recommend placing it in xml format. Thus, you can enable schema validation to verify that the file meets your standards. In java and .NET there are tools for importing and exporting data from / to the XML format.

+1
source

Suppose your format is:

  struct Format { struct Header // 1 { byte a; bool b1, b2, b3, b4, b5, b6, b7, b8; string name; } struct Container // 1...* { MyTypeEnum Type; byte[] data; } } enum MyTypeEnum { Sound, Video, Image } 

Then I will have a file with the sequence:


byte // a

byte // b

int // name size

char [] // name (which has the above size, remember that char is 16 bits in .NET)

int // type MyTypeEnum

int // data size

byte [] // data (the size of which is indicated above)


Then you can repeat the last three lines as much as you want.

For reading, use BinaryReader , which supports reading bytes, integers, and series of bytes. There is also a BinaryWriter .

Also, remember that Microsoft.NET (thus, on a Windows / Intel-based computer) is hardly accepted. Similarly, BinaryReader and BinaryWriter .

+1
source

All Articles