A simple working example of GzipOutputStream and GzipInputStream with protocol buffers

after several days of experimenting with protocol buffers, I tried to compress the files. With Python, this is fairly easy to do and does not require any game with threads.

Since most of our code is written in C ++, I would like to compress / decompress files in one language. I tried the gzip boost library but couldn’t get it working (not compressing):

int writeEventCollection(HEP::MyProtoBufClass* protobuf, std::string filename, unsigned int compressionLevel) { ofstream file(filename.c_str(), ios_base::out | ios_base::binary); filtering_streambuf<output> out; out.push(gzip_compressor(compressionLevel)); out.push(file); if (!protobuf->SerializeToOstream(&file)) {//serialising to wrong stream I asume cerr << "Failed to write ProtoBuf." << endl; return -1; } return 0; } 

I searched for examples using GzipOutputStream and GzipInputStream with protocol buffers, but could not find an example.

As you probably noticed, I'm a newbie in streams at best and would really appreciate a fully working example, as in http://code.google.com/apis/protocolbuffers/docs/cpptutorial.html (I have my address book, how to save it in a gziped file?)

Thanks in advance.

EDIT: working examples.

Example 1 following the answer here on StackOverflow

 int writeEventCollection(shared_ptr<HEP::EventCollection> eCollection, std::string filename, unsigned int compressionLevel) { filtering_ostream out; out.push(gzip_compressor(compressionLevel)); out.push(file_sink(filename, ios_base::out | ios_base::binary)); if (!eCollection->SerializeToOstream(&out)) { cerr << "Failed to write event collection." << endl; return -1; } return 0; } 

Example 2 of the following answer to the Google Protobuf discussion group :

 int writeEventCollection2(shared_ptr<HEP::EventCollection> eCollection, std::string filename, unsigned int compressionLevel) { using namespace google::protobuf::io; int filedescriptor = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC, S_IREAD | S_IWRITE); if (filedescriptor == -1) { throw "open failed on output file"; } google::protobuf::io::FileOutputStream file_stream(filedescriptor); GzipOutputStream::Options options; options.format = GzipOutputStream::GZIP; options.compression_level = compressionLevel; google::protobuf::io::GzipOutputStream gzip_stream(&file_stream, options); if (!eCollection->SerializeToZeroCopyStream(&gzip_stream)) { cerr << "Failed to write event collection." << endl; return -1; } close(filedescriptor); return 0; } 

Some performance comments (reading the current format and writing ProtoBuf 11146 files): Example 1:

 real 13m1.185s user 11m18.500s sys 0m13.430s CPU usage: 65-70% Size of test sample: 4.2 GB (uncompressed 7.7 GB, our current compressed format: 7.7 GB) 

Example 2:

 real 12m37.061s user 10m55.460s sys 0m11.900s CPU usage: 90-100% Size of test sample: 3.9 GB 

It seems that the Google method uses the processor more efficiently, a little faster (although I expect it to be within accuracy) and will create a 7% smaller dataset with the same compression option.

+7
source share
1 answer

Your assumption is correct: the code you posted does not work because you write directly to ofstream , and not through filtering_streambuf . To do this, you can instead of filtering_ostream :

 ofstream file(filename.c_str(), ios_base::out | ios_base::binary); filtering_ostream out; out.push(gzip_compressor(compressionLevel)); out.push(file); if (!protobuf->SerializeToOstream(&out)) { // ... etc. } 

Or more succinctly, using file_sink :

 filtering_ostream out; out.push(gzip_compressor(compressionLevel)); out.push(file_sink(filename, ios_base::out | ios_base::binary)); if (!protobuf->SerializeToOstream(&out)) { // ... etc. } 

Hope this helps!

+2
source

All Articles