Yesterday I discovered a strange error in a fairly simple code that basically gets the text from ifstream and symbolizes it. The code that really fails makes several calls to get () / peek (), which look for the token "/ *". If a token is found in the stream, the unget () function is called so that the next method sees the stream starting with the token.
Sometimes, apparently depending only on the length of the file, the unget () call fails. Inside, it calls pbackfail (), which then returns EOF. However, after clearing the state of the stream, I can happily read more characters so that it is not exactly EOF ..
After digging here is the full code that easily reproduces the problem:
#include <iostream> #include <fstream> #include <string> //generate simplest string possible that triggers problem void GenerateTestString( std::string& s, const size_t nSpacesToInsert ) { s.clear(); for( size_t i = 0 ; i < nSpacesToInsert ; ++i ) s += " "; s += "/*"; } //write string to file, then open same file again in ifs bool WriteTestFileThenOpenIt( const char* sFile, const std::string& s, std::ifstream& ifs ) { { std::ofstream ofs( sFile ); if( ( ofs << s ).fail() ) return false; } ifs.open( sFile ); return ifs.good(); } //find token, unget if found, report error, show extra data can be read even after error bool Run( std::istream& ifs ) { bool bSuccess = true; for( ; ; ) { int x = ifs.get(); if( ifs.fail() ) break; if( x == '/' ) { x = ifs.peek(); if( x == '*' ) { ifs.unget(); if( ifs.fail() ) { std::cout << "oops.. unget() failed" << std::endl; bSuccess = false; } else { x = ifs.get(); } } } } if( !bSuccess ) { ifs.clear(); std::string sNext; ifs >> sNext; if( !sNext.empty() ) std::cout << "remaining data after unget: '" << sNext << "'" << std::endl; } return bSuccess; } int main() { std::string s; const char* testFile = "tmp.txt"; for( size_t i = 0 ; i < 12290 ; ++i ) { GenerateTestString( s, i ); std::ifstream ifs; if( !WriteTestFileThenOpenIt( testFile, s, ifs ) ) { std::cout << "file I/O error, aborting.."; break; } if( !Run( ifs ) ) std::cout << "** failed for string length = " << s.length() << std::endl; } return 0; }
The program crashes when the line length approaches typical buffers 4096, 8192, 12288 with several = = -2, here is the conclusion:
oops.. unget() failed remaining data after unget: '*' ** failed for string length = 4097 oops.. unget() failed remaining data after unget: '*' ** failed for string length = 8193 oops.. unget() failed remaining data after unget: '*' ** failed for string length = 12289
This happens when testing in Windows XP and 7, both compiled in debug / release mode, and in dynamic / static execution mode, both 32-bit and 64-bit systems / compilers, all with VS2008, compiler / linker parameters according to by default. When testing with gcc4.4.5 on a 64-bit Debian system, no problems were found.
Questions:
- can other people check this out? I would really appreciate an active form of SO collaboration.
- There is something wrong with the code that might cause the problem (not to mention whether it makes sense)
- or any compiler flags that might cause this behavior?
- all parser code is quite important for the application and tested to a large extent, but, of course, this problem was not found in the test code. Should I come up with extreme tests, and if so, how do I do this? How could I predict that this could cause problems?
- If this is really a mistake, where is the best place to report it?