Efficient way to save objects to binary files

I have a class that consists mainly of a matrix of vectors: vector< MyFeatVector<T> > m_vCells , where the external vector is a matrix. Each element in this matrix is ​​then a vector (I extended the stl vector class and named it MyFeatVector<T> ).

I am trying to create an efficient method for storing objects of this class in binary files. So far I need three nested loops:

foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );

where this->at(dy,dx,dz) retrieves the dz element of the vector at the position [dy,dx] .

Is it possible to save the m_vCells private element without using loops? I tried something like: foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), (this->m_vCells.size())*sizeof(CFeatureVector<T>)); which seems to be working incorrectly. It can be assumed that all vectors in this matrix are the same size, although a more general solution is also welcomed :-)

In addition, after implementing nested loops, storing objects of this class in binary files requires more physical space than storing the same objects in text files. It's a bit strange.

I tried to follow the suggestion of http://forum.allaboutcircuits.com/showthread.php?t=16465 , but could not come to the right decision.

Thanks!

Below is a simplified example of my serialization and unserialization methods.

 template < typename T > bool MyFeatMatrix<T>::writeBinary( const string & ofile ){ ofstream foutput(ofile.c_str(), ios::out|ios::binary); foutput.write(reinterpret_cast<char*>(&this->m_nHeight), sizeof(int)); foutput.write(reinterpret_cast<char*>(&this->m_nWidth), sizeof(int)); foutput.write(reinterpret_cast<char*>(&this->m_nDepth), sizeof(int)); //foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), nSze*sizeof(CFeatureVector<T>)); for(register int dy=0; dy < this->m_nHeight; dy++){ for(register int dx=0; dx < this->m_nWidth; dx++){ for(register int dz=0; dz < this->m_nDepth; dz++){ foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) ); } } } foutput.close(); return true; } 

 template < typename T > bool MyFeatMatrix<T>::readBinary( const string & ifile ){ ifstream finput(ifile.c_str(), ios::in|ios::binary); int nHeight, nWidth, nDepth; finput.read(reinterpret_cast<char*>(&nHeight), sizeof(int)); finput.read(reinterpret_cast<char*>(&nWidth), sizeof(int)); finput.read(reinterpret_cast<char*>(&nDepth), sizeof(int)); this->resize(nHeight, nWidth, nDepth); for(register int dy=0; dy < this->m_nHeight; dy++){ for(register int dx=0; dx < this->m_nWidth; dx++){ for(register int dz=0; dz < this->m_nDepth; dz++){ finput.read( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) ); } } } finput.close(); return true; } 
+4
source share
4 answers

The most efficient method is to store objects in an array (or in continuous space), and then insert the buffer into the file. The advantage is that the discs do not have time to unload, and recording can be performed contiguously, rather than in random places.

If this is a performance bottleneck, you might consider using multiple threads, one additional thread for processing output. Flush the objects into the buffer, set the flag, then the write stream will process the output, freeing up the main task for more important tasks.

Edit 1: Serialization Example
The following code has not been compiled and is for illustrative purposes only.

 #include <fstream> #include <algorithm> using std::ofstream; using std::fill; class binary_stream_interface { virtual void load_from_buffer(const unsigned char *& buf_ptr) = 0; virtual size_t size_on_stream(void) const = 0; virtual void store_to_buffer(unsigned char *& buf_ptr) const = 0; }; struct Pet : public binary_stream_interface, max_name_length(32) { std::string name; unsigned int age; const unsigned int max_name_length; void load_from_buffer(const unsigned char *& buf_ptr) { age = *((unsigned int *) buf_ptr); buf_ptr += sizeof(unsigned int); name = std::string((char *) buf_ptr); buf_ptr += max_name_length; return; } size_t size_on_stream(void) const { return sizeof(unsigned int) + max_name_length; } void store_to_buffer(unsigned char *& buf_ptr) const { *((unsigned int *) buf_ptr) = age; buf_ptr += sizeof(unsigned int); std::fill(buf_ptr, 0, max_name_length); strncpy((char *) buf_ptr, name.c_str(), max_name_length); buf_ptr += max_name_length; return; } }; int main(void) { Pet dog; dog.name = "Fido"; dog.age = 5; ofstream data_file("pet_data.bin", std::ios::binary); // Determine size of buffer size_t buffer_size = dog.size_on_stream(); // Allocate the buffer unsigned char * buffer = new unsigned char [buffer_size]; unsigned char * buf_ptr = buffer; // Write / store the object into the buffer. dog.store_to_buffer(buf_ptr); // Write the buffer to the file / stream. data_file.write((char *) buffer, buffer_size); data_file.close(); delete [] buffer; return 0; } 

Edit 2: row vector class

 class Many_Strings : public binary_stream_interface { enum {MAX_STRING_SIZE = 32}; size_t size_on_stream(void) const { return m_string_container.size() * MAX_STRING_SIZE // Total size of strings. + sizeof(size_t); // with room for the quantity variable. } void store_to_buffer(unsigned char *& buf_ptr) const { // Treat the vector<string> as a variable length field. // Store the quantity of strings into the buffer, // followed by the content. size_t string_quantity = m_string_container.size(); *((size_t *) buf_ptr) = string_quantity; buf_ptr += sizeof(size_t); for (size_t i = 0; i < string_quantity; ++i) { // Each string is a fixed length field. // Pad with '\0' first, then copy the data. std::fill((char *)buf_ptr, 0, MAX_STRING_SIZE); strncpy(buf_ptr, m_string_container[i].c_str(), MAX_STRING_SIZE); buf_ptr += MAX_STRING_SIZE; } } void load_from_buffer(const unsigned char *& buf_ptr) { // The actual coding is left as an exercise for the reader. // Psuedo code: // Clear / empty the string container. // load the quantity variable. // increment the buffer variable by the size of the quantity variable. // for each new string (up to the quantity just read) // load a temporary string from the buffer via buffer pointer. // push the temporary string into the vector // increment the buffer pointer by the MAX_STRING_SIZE. // end-for } std::vector<std::string> m_string_container; }; 
+3
source

I suggest you read the C ++ FAQ on Serialization , and you can choose what works best for your

When you work with structures and classes, you have to take care of two things.

  • Pointers inside a class
  • Bookmark Bytes

Both of these can make some notorious results in your release. IMO, the object must implement serialization and de-serialization of the object. An object may well know structures, pointer data, etc. In this way, he can decide which format can be effectively implemented.

In any case, you will have to iterate or wrap it somewhere. After you finish the implementation of the serialization and de-serialization functions (either you can write using operators or functions). Especially when you work with objects, overloading <<<and β†’ operators would make it easy to pass an object.

As for your question about using basic vector pointers, it might work if it's a single vector. But this is not a good idea.


Update according to the update question.

There are a few things you should keep in mind before overriding STL members. They are not a good candidate for inheritance because it does not have virtual destructors. If you use basic data types and similar structures, this will not cause big problems. But if you use it in a truly object-oriented way, you may encounter some unpleasant behavior.

As for your code

  • Why do you attribute it to char *?
  • How you serialize an object is your choice. IMO what you did is the basic operation of writing a file in the name of serialization.
  • Serialization is done to the object. those. the "T" parameter in your template. If you use POD or basic types, you do not need special synchronization. Otherwise, you must carefully choose the method of recording the object.
  • The choice of text format or binary format is your choice. A text format always has a cost, while at the same time it is easy to manipulate it, not a binary format.

For example, the following code is for a simple read and write operation (in text format).

 fstream fr("test.txt", ios_base::out | ios_base::binary ); for( int i =0;i <_countof(arr);i++) fr << arr[i] << ' '; fr.close(); fstream fw("test.txt", ios_base::in| ios_base::binary); int j = 0; while( fw.eof() || j < _countof(arrout)) { fw >> arrout[j++]; } 
+2
source

It seems to me that the most direct root for creating a binary file containing a vector is the memory card for the file and placing it in the displayed area. As sarat pointed out , you need to worry about how pointers are used in the class. But the boost-interprocess library has a tutorial on how to do this, using its shared memory areas, which include memory-mapped files .

0
source

First, did you watch Boost.multi_array ? It's always good to take something ready, rather than reinvent the wheel.

However, I'm not sure if this is useful, but here is how I will implement the basic data structure, and it would be pretty easy to serialize:

 #include <array> template <typename T, size_t DIM1, size_t DIM2, size_t DIM3> class ThreeDArray { typedef std::array<T, DIM1 * DIM2 * DIM3> array_t; array_t m_data; public: inline size_t size() const { return data.size(); } inline size_t byte_size() const { return sizeof(T) * data.size(); } inline T & operator()(size_t i, size_t j, size_t k) { return m_data[i + j * DIM1 + k * DIM1 * DIM2]; } inline const T & operator()(size_t i, size_t j, size_t k) const { return m_data[i + j * DIM1 + k * DIM1 * DIM2]; } inline const T * data() const { return m_data.data(); } }; 

You can directly serialize the data buffer:

 ThreeDArray<int, 4, 6 11> arr; /* ... */ std::ofstream outfile("file.bin"); outfile.write(reinterpret_cast<char*>(arr.data()), arr.byte_size()); 
0
source

All Articles