QTextStream's behavior for finding a string is not as expected

I have a few lines of code:

QFile file("h:/test.txt"); file.open(QFile::ReadOnly | QFile::Text); QTextStream in(&file); bool found = false; uint pos = 0; do { QString temp = in.readLine(); int p = temp.indexOf("something"); if (p < 0) { pos += temp.length() + 1; } else { pos += p; found = true; } } while (!found && !in.atEnd()); in.seek(0); QString text = in.read(pos); cout << text.toStdString() << endl; 

The idea is to search for a text file for a specific char sequence, and if it is found, load the file from the very beginning until the text you are looking for appears. The input I used for testing was:

 this is line one, the first line this is line two, it is second this is the third line and this is line 4 line 5 goes here and finally, there is line number 6 

And here's the weird part - if the search string is in any of the strings saved for the last, I get the expected behavior. It works great.

BUT , if I look for the line that is on the last line of 6, the result will always contain 5 characters. If it was the 7th line, the result will be 6 characters, etc., When the desired line is in the last line, the result is always lineNumber - 1 characters shorter.

So, is this a mistake, or am I missing something obvious?

EDIT: just to clarify, I am not asking for alternative ways to do this, I ask why I get this behavior.

+7
source share
5 answers

Obviously, you get this behavior because readLine () skips the cursor to fit the line with line separator characters (either LF CRLF or CR depending on the file). The buffer that you get from this method does not bind these characters , so you do not accept these characters in your position calculations.

The solution is not to read in rows, but in a buffer. Here is your code modified:

 QFile file("h:/test.txt"); file.open(QFile::ReadOnly | QFile::Text); QTextStream in(&file); bool found = false; uint pos = 0; qint64 buffSize = 64; // adjust to your needs do { QString temp = in.read(buffSize); int p = temp.indexOf("something"); if (p < 0) { uint posAdj = buffSize; if (temp.length() < buffSize) posAdj = temp.length(); pos += posAdj; } else { pos += p; found = true; } } while (!found && !in.atEnd()); in.seek(0); QString text = in.read(pos); cout << text.toStdString() << endl; 

EDIT

The above code contains an error due to the fact that the word may be separated by a buffer. Here is an example of an input that breaks stuff (assuming we're reading cakes):

 test test test test test test test test test test test test keks test test test test test test test test test test test test test test test test test test test test test test test test 

Decision

Here is the complete code that works perfectly with all the input I tried:

 #include <QFile> #include <QTextStream> #include <iostream> int findPos(const QString& expr, QTextStream& stream) { if (expr.isEmpty()) return -1; // buffer size of same length as searched expr should be OK to go qint64 buffSize = quint64(expr.length()); stream.seek(0); QString startBuffer = stream.read(buffSize); int pos = 0; while(!stream.atEnd()) { QString cycleBuffer = stream.read(buffSize); QString searchBuffer = startBuffer + cycleBuffer; int bufferPos = searchBuffer.indexOf(expr); if (bufferPos >= 0) return pos + bufferPos + expr.length(); pos += cycleBuffer.length(); startBuffer = cycleBuffer; } return pos; } int main(int argc, char *argv[]) { Q_UNUSED(argc); Q_UNUSED(argv); QFile file("test.txt"); file.open(QFile::ReadOnly | QFile::Text); QTextStream in(&file); int pos = findPos("keks", in); in.seek(0); QString text = in.read(pos); std::cout << text.toUtf8().data() << std::endl; } 
+4
source

When searching in the last line, you read the entire input stream - in.atEnd () returns true. It seems that it somehow distorts either the file or the text stream, or sets them out of sync, so the search is no longer valid.

If you replace

 in.seek(0); QString text = in.read(pos); cout << text.toStdString() << endl; 

by

 QString text; if(in.atEnd()) { file.close(); file.open(QFile::ReadOnly | QFile::Text); QTextStream in1(&file); text = in1.read(pos); } else { in.seek(0); text = in.read(pos); } cout << text.toStdString().c_str() << endl; 

It will work as expected. Postscript Then there may be some cleaner solution, and then reopening the file, but the problem is certainly related to reaching the end of both the stream and the file and trying to work with them after ...

+4
source

You know the difference between window endings and * nix (\ r \ n vs \ n). When you open the file in text mode, you should know that the whole sequence \ r \ n is translated to \ n.

Your error is in the source code that you are trying to calculate the offset of the missing line, but you do not know the exact length of the line in the text file.

 length = number_of_chars + number_of_eol_chars where number_of_chars == QString::length() and number_of_eol_chars == (1 if \n) or (2 if \r\n) 

You could not find number_of_eol_chars without raw file access. And you do not use it in your code, because you open the file as text, but not as binary. So the error in the code is that you had hardcoded number_of_eol_chars with 1, instead of detecting it. For each line in Windows text files (using \ r \ n eol) you will get an error in pos for each line that is missing.

Fixed Code:

 #include <QFile> #include <QTextStream> #include <iostream> #include <string> int main(int argc, char *argv[]) { QFile f("test.txt"); const bool isOpened = f.open( QFile::ReadOnly | QFile::Text ); if ( !isOpened ) return 1; QTextStream in( &f ); const QString searchFor = "finally"; bool found = false; qint64 pos = 0; do { const qint64 lineStartPos = in.pos(); const QString temp = in.readLine(); const int ofs = temp.indexOf( searchFor ); if ( ofs < 0 ) { // Here you skip line and increment pos on exact length of line // You shoud not hardcode "1", because it may be "2" (\n or \r\n) const qint64 length = in.pos() - lineStartPos; pos += length; } else { pos += ofs; found = true; } } while ( !found && !in.atEnd() ); in.seek( 0 ); const QString text = in.read( pos ); std::cout << text.toStdString() << std::endl; return 0; } 
+3
source

I'm not quite sure why you see this behavior, but I suspect this is due to the end of the line. I tried your code, and I saw only the last line behavior, when the file had CRLF line endings And there was no new line (CRLF) at the end of the file. So yes, it's weird. If the file had LF line endings, it always worked as expected.

With that being said, it is probably not recommended to track the position by adding + 1 at the end of each line, because you wonโ€™t know if your source file was CRLF or LF, and QTextStream always deletes the end of the line. Here's a feature that should work better. It builds the output string line by line, and I have not seen any strange behavior with it:

 void searchStream( QString fileName, QString searchStr ) { QFile file( fileName ); if ( file.open(QFile::ReadOnly | QFile::Text) == false ) return; QString text; QTextStream in(&file); QTextStream out(&text); bool found = false; do { QString temp = in.readLine(); int p = temp.indexOf( searchStr ); if (p < 0) { out << temp << endl; } else { found = true; out << temp.left(p); } } while (!found && !in.atEnd()); std::cout << text.toStdString() << std::endl; } 

It does not track the position in the source stream, so if you really need a position, I would recommend using QTextStream :: pos (), since it will be accurate whether the file is CRLF or LF.

+2
source

The QTextStream.read () method takes as the parameter the maximum number of characters to read, not the file position. In many environments, position is not a simple symbol: VMS and Windows are perceived as exceptions. VMS imposes a record structure that uses many hidden bits of metadata in files and file positions, are "magic cookies"

The only file system independent way to get the correct value is to use QTextStream :: pos () when the file is already installed in the right place, and then continue reading until the file position returns to the same place.

(Fixed because there was an initially vague requirement prohibiting multiple selections for buffering text.)
However, given the requirements for the program, it makes no sense to reread the first part of the file. Start saving the text at the beginning and stop when the line is found:

 QString out; do { QString temp = in.readLine(); int p = temp.indexOf("something"); if (p < 0) { out += temp; } else { out += temp.substr(pos); //not sure of the proper function/parameters here break; } } while (!in.atEnd()); cout << out.toStdString() << endl; 

Since you are on Windows, processing text files translates '\ r \ n' to '\ n', which causes a mismatch in file positioning and character counting. There are several ways to get around this, but perhaps the easiest is to simply treat the file as binary (that is, not โ€œtextโ€, discarding text mode ) to prevent translation:

 file.open(QFile::ReadOnly); 

Then the code should work as expected. This does not harm \ r \ n output on Windows, but it can sometimes cause unpleasant manifestations when using Windows text utilities. If this is important, do a search and replace \ r \ n \ n when the text is in memory.

+2
source

All Articles