How does the compiler optimize getline () so efficiently?

I know that optimizing the compiler can be quite esoteric, but my example is so simple that I would like to know if I can understand if anyone can understand what it can do.

I have a text file of 500 MB. I declare and initialize fstream:

std::fstream file(path,std::ios::in) 

I need to read the file sequentially. It has a delimitation, but the lengths of the fields are not known and vary between lines. The actual parsing I need to do for each line added very little time to the total (which really surprised me since I did string :: find on each line from getline. I thought it would be slow).

In general, I want to search every line for a line and break the loop when I find it. I also increase and highlight line numbers for my curiosity, I confirmed that this adds a little time (5 seconds or so), and allows me to see how it breaks through short lines and slows down on long lines.

I have text that can be found as a unique line labeled eof, so I need to search for each line. I do this on my phone, so I apologize for the formatting problems, but it's pretty simple. I have a function that takes my stream as a link, and the text, which can be found as a string, returns std :: size_t.

 long long int lineNum = 0; while (std::getline (file, line)) { pos = line.find(text); lineNum += 1; std::cout << std::to_string(lineNum) << std::endl; if (pos != -1) return file.tellg(): } return std::string::npos; 

Edit: lingxi indicated that to_string is not required here, thanks. As already mentioned, completely eliminating the calculation and output of the line number, saves a few seconds, which in my pre-optimized example is a small percentage of the total.

This successfully goes through each line and returns the end position in 408 seconds. I had a minimal improvement when trying to put the file in a string stream or omitting everything in the whole loop (just getline to the end, no checks, no searching or displaying). Also, pre-reserving huge space for the row did not help.

It seems that getline is completely a driver. However ... if I compile the / O 2 flag (MSVC ++), I get a comic book faster than 26 seconds. In addition, there is no noticeable slowdown on long lines and short ones. Obviously, the compiler is doing something completely different. No complaints from me, but no thoughts on how this is achieved? As an exercise, I would like to try to make my code run faster before optimizing the compiler.

I'm sure this has something to do with how getline manipulates a string. Will it be faster (alas, it can’t check for some time) to just reserve the whole file for a line and read a character by character, increasing the number of lines when transferring / n? Also, will the compiler use things like mmap?

UPDATE: I will send the code when I get home tonight. It seems like just disabling runtime checking reduced execution from 400 seconds to 50! I tried to perform the same functionality using arrays of type c. I'm not super experienced, but it was easy enough to dump data into an array of characters and scroll through it in search of new lines or the first letter of my target string.

Even in full debugging mode, it reaches the end and correctly finds a line in 54 seconds. 26 seconds off and 20 seconds. So, from my informal, special experiments, does it seem like string and thread functions fall prey to runtime checks? Again, I'll double-check when I get home.

+5
source share
1 answer

The reason for this dramatic acceleration is that the iostream class hierarchy is based on templates ( std::ostream is actually a typedef of a template called std::basic_ostream ), and most of its code is in the headers. C ++ iostreams make several function calls to process each byte in a stream. However, most of these functions are pretty trivial. Including optimization, most of these calls are built-in, giving the compiler the fact that std::getline essentially copies characters from one buffer to another until it finds a new line - usually it is "hidden" under several levels of function calls. This can be further optimized, reducing the overhead per byte by an order of magnitude.

The behavior of buffering does not actually change between the optimized and non-optimized versions, otherwise the acceleration will be even higher.

+1
source

All Articles