Binary version of iostream

Question

Binary version of iostream

I am writing a binary version of iostreams. It essentially allows you to write binary files, but gives you great control over the file format. Usage example:

my_file << binary::u32le << my_int << binary::u16le << my_string;

Will write my_int as an unsigned 32-bit integer, and my_string as a string with a prefix of length (where the prefix is u16le.) To read the file back, you can flip the arrows. It works great. However, I hit a punch in the design, and I'm still on the fence about it. So, it's time to ask. (We make a couple of assumptions, such as 8-bit bytes, 2s padding ints, and IEEE floats at the moment.)

iostreams, under the hood, use streambufs. This is a fantastic design - iostreams encodes serialization of " int " into text, and let the underlying streambuf handle the rest. This way you get cout, fstream, stringstream, etc. All this, both iostreams and streambuf, are templates, usually in char, but sometimes also like wchar. My data, however, is a stream of bytes, which is best represented by an " unsigned char ".

My first attempts were unsigned char based class templates. std::basic_string is good enough, but streambuf not. I ran into several problems with a class called codecvt that I could never use for an unsigned char theme. This raises two questions:

1) Why is streambuf responsible for such things? It seems that code conversions are on the side of streambuf's responsibility - streambufs should take a stream and buffer data to / from it. Nothing more. Something as high as the code transforms seems to belong to iostreams.

Since I could not get templated streambufs to work with unsigned char, I returned to char and just cast data between char / unsigned char. I tried to minimize the number of shots for obvious reasons. Most of the data is mostly terminated by the read () or write () function, which then calls the underlying streambuf. (And use the cast in the process.) The read function is basically:

 size_t read(unsigned char *buffer, size_t size) { size_t ret; ret = stream()->sgetn(reinterpret_cast<char *>(buffer), size); // deal with ret for return size, eof, errors, etc. ... }

Good decision, bad decision?

The first two questions indicate the need for additional information. First, projects such as boost :: serialization were considered, but they exist at a higher level because they define their own binary format. This is more for reading / writing at a lower level, where you need to determine the format or the format is already defined, or bulk metadata is not required or not required.

Secondly, some asked about the binary::u32le . This is an instance of a class that has the desired consistency and width, currently, possibly signed in the future. The stream contains a copy of the last instance of this class and is used for serialization. This was a bit of a workaround, I initially tried to overload the <statement:

 bostream &operator << (uint8_t n); bostream &operator << (uint16_t n); bostream &operator << (uint32_t n); bostream &operator << (uint64_t n);

However, at the time, this did not seem to work. I had several problems with an ambiguous function call. This was especially true for constants, although you could, as one poster suggested, drop or just declare it as const <type> . I seem to remember that there was another big problem.

+7

c ++ iostream binary streambuf

Thanatos Jul 19 '09 at 20:41

source share

4 answers

Trevor robinson · Answer 1 · 2010-03-08T20:43:43+0000

I agree with legalization. I needed to do practically what you are doing, and looked at the overload << / >> , but came to the conclusion that iostream is simply not designed to host it. First, I did not want to subclass stream classes to be able to define my overloads.

My solution (which was required only for temporary serialization of data on one machine and, therefore, did not require addressing) was based on this template:

 // deducible template argument read template <class T> void read_raw(std::istream& stream, T& value, typename boost::enable_if< boost::is_pod<T> >::type* dummy = 0) { stream.read(reinterpret_cast<char*>(&value), sizeof(value)); } // explicit template argument read template <class T> T read_raw(std::istream& stream) { T value; read_raw(stream, value); return value; } template <class T> void write_raw(std::ostream& stream, const T& value, typename boost::enable_if< boost::is_pod<T> >::type* dummy = 0) { stream.write(reinterpret_cast<const char*>(&value), sizeof(value)); }

Then I overloaded read_raw / write_raw for any types other than POD (like strings). Note that only the first version of read_raw should be overloaded; if you use ADL correctly , the second (1-arg) version can cause 2-arg overloads, which are later defined in other namespaces.

Record Example:

 int32_t x; int64_t y; int8_t z; write_raw(is, x); write_raw(is, y); write_raw<int16_t>(is, z); // explicitly write int8_t as int16_t

Reading example:

 int32_t x = read_raw<int32_t>(is); // explicit form int64_t y; read_raw(is, y); // implicit form int8_t z = numeric_cast<int8_t>(read_raw<int16_t>(is));

It’s not as sexy as overloaded operators, and everything doesn’t fit on one line easily (which I usually avoid, since the debugging points are line-oriented), but I think it turned out to be simpler, more obvious, and not much more details.

Tim sylvester · Answer 2 · 2009-07-19T21:42:52+0000

As I understand it, the stream properties that you use to indicate types would be more suitable for specifying attribute values, packaging, or other metadata. The compiler must handle the types themselves. At least the way it seems that the STL design was developed.

If you use overloads to automatically separate types, you will need to specify the type only when it differs from the declared type of the variable:

 Stream& operator<<(int8_t); Stream& operator<<(uint8_t); Stream& operator<<(int16_t); Stream& operator<<(uint16_t); etc. uint32_t x; stream << x << (uint16_t)x;

Reading types other than the declared type will be slightly messier. In general, however, you should avoid reading or writing from variables of a type other than the type of output.

I believe the standard version of std :: codecvt does nothing, returning "noconv" for everything. It really does something when using "wide" character streams. Can you set a similar definition for codecvt? If for some reason it is not practical to define a no-op codecvt for your stream, then I don’t see any problems with your casting solution, especially since it is isolated from one place.

Finally, are you sure that you are not better off using some standard serialization code like Boost instead of eating it?

David Rodríguez - dribeas · Answer 3 · 2009-07-19T21:52:12+0000

We needed to do something similar to what you are doing, but we went the other way. I'm interested in how you defined your interface. Part of what I don't know how you can handle is the manipulators that you defined (binary :: u32le, binaryu16le).

With basic_streams, the manipulator controls how all the following elements will be read / written, but in your case it probably does not make sense, since the size (part of your manipulator information) depends on the variable passed in and out.

 binary_istream in; int i; int i2; short s; in >> binary::u16le >> i >> binary::u32le >> i2 >> s;

In the above code, it might make sense to determine if the variable i 32 bits (assuming int is 32 bits) you want to extract only 16 bits from the serialized stream, while you want to extract all 32 bits in i2 . After that, either the user is forced to enter manipulators for each other type being transferred, or the manipulator still has an effect, and when a short one is transmitted and 32 bits are read with possible overflow, and in some way the user is likely to get unexpected results.

The size does not seem to belong (in my opinion) to the manipulators.

As a remark, in our case, since we had other restrictions on the definition of runtime types, we completed the creation of our own meta-system for creating types at runtime (variant type), and then we completed the de-serialization implementation for these types ( formatting style), so our serializers do not work with basic C ++ types, but rather with serialization / data pairs.

legalize · Answer 4 · 2009-07-20T02:26:15+0000

I would not use the <<operator because it is too closely related to the formatting of text input / output.

I would not use operator overloading for this at all. I will find another idiom.

Binary version of iostream

More articles: