Recently I decided to optimize the reading of some files that I did, because, as everyone says, reading a large piece of data into the buffer and then working with it is faster than using a large number of small reads. And my code, of course, is much faster, but after doing some profiling, memcpy seems to take a lot of time.
The essence of my code ...
ifstream file("some huge file"); char buffer[0x1000000]; for (yada yada) { int size = some arbitrary size usually around a megabyte; file.read(buffer, size); //Do stuff with buffer }
I use Visual Studio 11 and after profiling my code, it says ifstream::read() will end up calling xsgetn() , which copies from the internal buffer to my buffer. This operation takes more than 80% of the time! In second place comes uflow() , which takes 10% of the time.
Is there any way around this copying? Can I somehow tell ifstream in order to buffer the size that I need directly in my buffer? Does the C-style FILE* such an internal buffer?
UPDATE: due to what people tell me to use cstdio ... I did a test.
EDIT: Unfortunately, the old code was full of failures (he didn't even read the whole file!). You can see it here: http://pastebin.com/4dGEQ6S7
Here is my new landmark:
const int MAX = 0x10000; char buf[MAX]; string fpath = "largefile"; int main() { { clock_t start = clock(); ifstream file(fpath, ios::binary); while (!file.eof()) { file.read(buf, MAX); } clock_t end = clock(); cout << end-start << endl; } { clock_t start = clock(); FILE* file = fopen(fpath.c_str(), "rb"); setvbuf(file, NULL, _IOFBF, 1024); while (!feof(file)) { fread(buf, 0x1, MAX, file); } fclose(file); clock_t end = clock(); cout << end-start << endl; } { clock_t start = clock(); HANDLE file = CreateFile(fpath.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_ALWAYS, NULL, NULL); while (true) { DWORD used; ReadFile(file, buf, MAX, &used, NULL); if (used < MAX) break; } CloseHandle(file); clock_t end = clock(); cout << end-start << endl; } system("PAUSE"); }
Time:
185
80
78
Well ... it seems like using C-style fread is faster than ifstream :: read. In addition, using ReadFile windows provides only a slight advantage, which is negligible (I looked at the code, and fread is basically a wrapper around ReadFile). It seems that I still go to thin.
A person is misleading to write a test that actually tests this material correctly.
CONCLUSION: Using <cstdio> is faster than <fstream> . The reason fstream is slower because C ++ streams have their own internal buffer. This leads to additional copying whenever you read / write and copy accounts for all the time spent on the stream. Even more shocking is that the extra time is more than the time taken to actually read the file.