Marshall multiple protobuf for file

Question

Marshall multiple protobuf for file

Background:

I use Google protobuf and I would like to read / write a few gigabytes of protobuf marshalled data to a file using C ++. Since it is recommended to keep the size of each protobuf object under 1 MB, I decided that the binary stream (illustrated below) written to the file would work. Each offset contains the number of bytes of the next offset until the end of the file is reached. Thus, each protobuf can remain under 1 MB, and I can combine them together with my heart content.

[int32 offset] [protobuf blob 1] [int32 offset] [protobuf blob 2] ... [eof]

I have an implementation that works on Github:

src / glob.hpp
src / glob.cpp
test / readglob.cpp
test / writeglob.cpp

But I feel like I wrote bad code, and I will be grateful for the tips on how to improve it. Thus,

Questions:

I use reinterpret_cast<char*> to read / write 32 bit integers to and from binary fstream . Since I use protobuf, I assume that all machines are not very similar. I also claim that int really 4 bytes. Is there a better way to read / write a 32-bit integer to binary fstream given these two limiting assumptions?
When reading from fstream I create a temporary fixed-length char buffer to then pass that fixed-length buffer to the protobuf library for decoding with ParseFromArray , since ParseFromIstream will consume the entire stream. I would rather just tell the library to read no more than the next N bytes from fstream , but there seems to be no such functionality in protobuf. What would be the most idiomatic way to pass a function no more than N bytes of fstream ? Or is my design upside down enough and should I consider a different approach completely?

Edit:

@ codymanix : I switched to char since istream::read requires a char array if I am not mistaken. I also do not use the extract operator >> , since I read that it was a bad form for use with binary streams. Or is this the last piece of fictitious advice?
@ Martin York : Deleted new / delete in favor of std::vector<char> . glob.cpp now updated. Thanks!

+4

c ++ protocol-buffers

Nicholas palko Aug 18 '10 at 14:16

source share

1 answer

Martin york · Accepted Answer · 2010-08-18T14:42:52+0000

Do not use the new [] / delete [].

Instead, we have a std :: vector, since release is guaranteed in case of exceptions.

Do not assume that reading will return all bytes requested by you.
Check with gcount () to make sure you have what you requested.

Instead of glob implementing code for input and output depending on the switch in the constructor. Rather, implement two specialized classes, such as ifstream / ofstream. This will simplify the interface and use.

Marshall multiple protobuf for file

More articles: