Easy to use extensible serialization / sorting?

I have a question about serializaton data structure. There are many possibilities for serializing data structures (also called sorting or deflation , see the wiki-article ). Every programming language, structure, standard, or library seems to bring its own serialization methods with it. Many also define their own data / interface description language (which I prefer the dependency structure to be language dependent, defined only inside the code). Just to name a few (see wiki article ): COM IDL, CORBA IDL, Thrift IDL, Google.proto protocol buffer, XSD, ASL.1 IDL, etc. Some of these production series can generate their own language data structures and code for serializing and deserializing these structures.

I did some research on this, but I still haven't decided. So my question is: What serialization should I use?

My requirements are extensibility , space efficiency (at least binary), efficient data access, ease of use (possibly generated code, getter and seters) and C ++ -compability.

Extensibility must provide backward and forward compatibility . To be more specific, often the data formats that I write will grow over time because I add new data fields that I could not foresee at the beginning of development. Now I would like to be able to read saved data from an obsolete format with a newer version of the software, data fields not found in the old saved data can be filled with default values ​​or something else. On the other hand, I would like to be able to read data written with a new description. Then, an unknown data field should be ignored by software compiled using the "old" data description (possibly generating some warning).

Any recommendations? Recommendations for further readings on this subject will also be evaluated.

--- Edit ---

1) boost :: serialization seems pretty popular. It has some really nice features, the documentation is very good, the ant syntax seems pretty straightforward. Maybe I’m a bit picky, but there are some things that I don’t like: I don’t see how this could handle advanced compilation (see 4 ). I would prefer the generated code.

2) goob protobuf seems to best suit my needs, but I did not look into their depths. They seem to cope well with forward and backward compatibility (see 5 ). They have code generators for different languages, and developers know very similar concepts like (see FAQ ). I will study the protobuffs more deeply.

3) Strengthening the spirit is not like what I'm looking for.

+6
c ++ serialization marshalling thrift
source share
2 answers

I used the accelerated serialization library for a while - it is extensible, correct, efficient and supports separate version control for each object that you serialize. All these features, of course, mean that it is a complex beast, and it takes some time to study. Not that fast to compile. And if you ever try to bring it to a platform that is not officially supported, expect to debug some very confusing code. File compatibility across platforms may be slightly flaky, and advanced compatibility will not work. In general, simplifying serialization is generally not a good choice if you need application instances that share with each other. However, this is not so bad for a proper project.

http://www.boost.org/doc/libs/1_46_0/libs/serialization/doc/index.html

Boost also has a new Spirit library for more general parsing / output, but I did not use it and did not recommend it based on my first impressions - it takes some digging to even understand what is intended for the named library.

In the end, for simpler projects, using your own serialization library may not be a bad choice - it is not too difficult, and you get exactly the functions that you need. Somehow disappointing that the C ++ world still did not seem to allow serialization, but I came to the conclusion that the last time I decided to solve serialization. Using serial communication for some time has given a good idea of ​​what to strive for in my own implementation, nonetheless.

+2
source share

Boost :: serialize excellent

  • Support for various archive versions
  • Good support for most data structures (pointers, vectors ...)
  • Very fast (10 seconds for 1 GB, so the limit is your hard drive).
  • Rather ease of use
  • Compression on the fly when used with boost :: iostreams

Disadvantages:

  • Perhaps the archive is not compatible with one table with another
  • For C ++ only, sharing with other languages

A good alternative that is growing is protocol buffers from Google http://code.google.com/p/protobuf/

  • Language independent
  • Version support
  • Very fast

So, if you want to exchange data between different systems, I would go with protocol buffers. However, if you have one application, I would use boost :: serialize

+2
source share

All Articles