C ++ Writing and reading performance from disk

Possible duplicate:
Writing a binary in C ++ is very fast

I have a large number of unsigned 32-bit integers in memory (1.5 billion entries). I need to write them to a file and read them in main memory.

Now I do this using:

ofstream ofs; ofs.open(filename); for (uint64_t i = 0 ; i < 1470000000 ; i++) ofs << integers << " " ; 

and

 ifstream ifs; ifs.open(filename); for (uint64_t i = 0 ; i < 1470000000 ; i++) ifs >> integers ; 

It takes a few minutes. Can anyone help me if there is any library method to make this faster? Or any suggestion, so I can run a performance test? Can someone show me a simple C ++ code that uses mmap to execute the above (on Linux)?

EDIT: EXAMPLE CASE

 #include<iostream> #include <stdint.h> #include <cstdio> #include <cstdlib> #include <sstream> using namespace std; main() { uint32_t* ele = new uint32_t [100] ; for(int i = 0; i < 100 ; i++ ) ele[i] = i ; for(int i = 0; i < 100 ; i++ ){ if(ele[i] < 20) continue ; else // write ele[i] to file ; } for(int i = 0; i < 100 ; i++ ){ if(ele[i] < 20) continue ; else // read number from file // ele[i] = number * 10 ; ; } std::cin.get(); } 
+4
source share
4 answers

The first thing to do is determine where time is going. Formatting and parsing the text are not trivial and can take some time, but you can actually write and read, given the size of the file. Secondly, to determine what the data should be: the fastest solution is almost certainly the mmap (or its Windows equivalent) array to the file directly, and never read or write. This does not however provide a portable representation, and even the compiler can make the data unreadable. (It is unlikely that 32 bits are integers today, but this happened in the past).

In general, if time is going to read and write, you will want to explore using mmap . If it will be formatting and parsing, you will want to conduct a study of the binary format - it can also help in reading and writing if it reduces the resulting files. The simplest binary format, recording values ​​using a standard network standard, requires no more than:

 void writeInt( std::ostream& dest, int32_t integer ) { dest.put( (integer >> 24) & 0xFF ); dest.put( (integer >> 16) & 0xFF ); dest.put( (integer >> 8) & 0xFF ); dest.put( (integer ) & 0xFF ); } int32_t readInt( std::istream& source ) { int32_t results = 0; results = source.get() << 24; results |= source.get() << 16; results |= source.get() << 8; results |= source.get(); return results; } 

(Some error checking should obviously be added.)

If many of the integers are actually small, you can try some variable-length coding, for example, used in the Google Buffers protocol. If most of your integers are in the range -64 ... 63, this can lead to the file being only a quarter of the size (which again will improve the time it takes to read and write).

+2
source

If you know the size, just fwrite / fread an array.

+2
source

You can get better performance by using a larger buffer for your input and output streams:

 ofstream ofs; char * obuffer = new char[bufferSize]; ofs.rdbuf ()->pubsetbuf (obuffer, bufferSize); ofs.open (filename); ifstream ifs; char * ibuffer = new char[bufferSize]; ifs.rdbuf ()->pubsetbuf (ibuffer, bufferSize); ifs.open (filename); 

Also ifs >> integers ; is a pretty slow way to parse integers. Try reading the lines, and then use std::strtol() to parse them. IME, it's noticeably faster.

+2
source

If you just want to copy, you can use this for better performance:

 std::ifstream input("input"); std::ofstream output("ouptut"); output << input.rdbuf(); 

or perhaps setting the size of the buffer may increase the speed:

 char cbuf[buf_size]; ifstream fin; fin.rdbuf()->pubsetbuf(cbuf,buf_size); 

In my answer, I did not consider the problem with a long int, because I just do not know why they should affect the performance of the stream, but I hope this helps.

0
source

All Articles