Explain a change in GNU C ++ filebuf :: underflow () interacting with filebuf :: seekoff ()

My company's products run on a number of qualified Linux hardware / software configurations. Historically, the compiler used was GNU C ++. For the purposes of this publication, we will consider the basic level version 3.2.3, since our software "worked as expected" through this version.

With the introduction of a newer qualified platform using GNU C ++ version 3.4.4, we began to observe some performance problems that we had not seen before. After some digging, one of our engineers came up with this test program:

#include <fstream> #include <iostream> using namespace std; class my_filebuf : public filebuf { public: my_filebuf() : filebuf(), d_underflows(0) {}; virtual ~my_filebuf() {}; virtual pos_type seekoff(off_type, ios_base::seekdir, ios_base::openmode mode = ios_base::in | ios_base::out); virtual int_type underflow(); public: unsigned int d_underflows; }; filebuf::pos_type my_filebuf::seekoff( off_type off, ios_base::seekdir way, ios_base::openmode mode ) { return filebuf::seekoff(off, way, mode); } filebuf::int_type my_filebuf::underflow() { d_underflows++; return filebuf::underflow(); } int main() { my_filebuf fb; fb.open("log", ios_base::in); if (!fb.is_open()) { cerr << "need log file" << endl; return 1; } int count = 0; streampos pos = EOF; while (fb.sbumpc() != EOF) { count++; // calling pubseekoff(0, ios::cur) *forces* underflow pos = fb.pubseekoff(0, ios::cur); } cerr << "pos=" << pos << endl; cerr << "read chars=" << count << endl; cerr << "underflows=" << fb.d_underflows << endl; return 0; } 

We ran it against a log file of about 751 KB in size. In previous configurations, we got the result:

 $ buftest pos=768058 read chars=768058 underflows=0 

In the new version, the result:

 $ buftest pos=768058 read chars=768058 underflows=768059 

Comment on the call to pubseekoff (0, ios :: cur), and excessive calls to underflow () will disappear. It's so clear that in newer versions of g ++, the pubseekoff () call "invalidates" the buffer, forcing the underflow () call.

I have read the normative document, and the phrase on pubseekoff () is certainly ambiguous. What is the relationship between the position of the base file pointer and gptr (), for example? Before or after calling underflow ()? Regardless, I find it annoying that g ++ "changed horses in the middle of the stream," so to speak. Moreover, even if the general seekoff () required invalidation of the buffer pointers, why is the equivalent of ftell ()?

Can someone point me to a discussion topic among developers that led to this behavior change? Do you have a brief description of the choices and trade-offs?

Additional loan

Clearly, I really don't know what I'm doing. I experimented to determine if there was a way, but not a portable one, to bypass invalidation when the offset is 0 and seekdir is ios :: cur. I came up with the following hack, directly accessing the file _ file_file filebuf file (I just wanted to compile it with version 3.4.4 on my machine):

 int sc(0); filebuf::pos_type my_filebuf::seekoff( off_type off, ios_base::seekdir way, ios_base::openmode mode ) { if ((off == 0) && (way == ios::cur)) { FILE *file =_M_file.file(); pos_type pos = pos_type(ftell(file)); sc++; if ((sc % 100) == 0) { cerr << "POS IS " << pos << endl; } return pos; } return filebuf::seekoff(off, way, mode); } 

However, diagnostics to print a position every 100 seekoff attempts give 8192 each time. A? Since this is a FILE * element of the filebuf file itself, I would expect the file position indicator to be synchronized with any underflow () calls made by the file file. Why am I wrong?

Update

First, let me emphasize that I understand that this part of my article is about non-portable hacks. However, not understanding here is negligible. I tried to call

 pos_type pos = _M_file.seekoff(0,ios::cur); 

instead, and it happily advances through the sample file, rather than getting stuck in 8192.

Final update

Inside my company, we made some workarounds that reduce productivity, enough for us to live with it.

Externally, David Krauss filed a bug against the GNU libstdC ++ streams, and recently Paolo Carlini checked the fix. The consensus was that the unwanted behavior was within the scope of the Standard, but there was a reasonable correction for the edge case we described.

So thanks, StackOverflow, David Krauss, Paolo Carlini and all the GNU developers!

+4
source share
2 answers

The seekoff requirements seekoff certainly confusing, but seekoff(0, ios::cur) assumed to be a special case that doesn't synchronize anything. Thus, this can probably be considered a mistake.

And this still happens in GCC 4.2.1 and 4.5 ...

The problem is that (0, ios::cur) has no special values ​​in _M_seek , which seekoff uses to call fseek to get the return value. While this succeeds, _M_seek unconditionally calls _M_set_buffer(-1); , which predictably invalidates the internal buffer. The following read operation calls underflow .

Found diff! See change -473,41 +486,26 . Comment was

  (seekoff): Simplify, set _M_reading, _M_writing to false, call _M_set_buffer(-1) ('uncommitted'). 

So this was not done to correct the error.

Filed error: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45628

+1
source

Well, I don’t know the exact reason for the change, but apparently the changes were made for (see GCC 3.4 Changelog ):

  • Optimized streambuf, filebuf, separate synchronized with C Standard I / O stream buffer.
  • Support for large files (files larger than 2 GB on 32-bit systems).

I suspect that a lot of file support is a big feature that will require such a change, since IOStreams can no longer assume that it can map the entire file to memory.

Correct synchronization with cstdio also an operation that may require more flushes to disk. You can disable this using std::sync_with_stdio .

+1
source

All Articles