Buffering a Standard I / O Library

In the book "Advanced Programming in UNIX Environments" (2nd edition), the author wrote in section 5.5 (stream operations of the standard I / O library):

When a file is open for reading and writing (a plus sign in a type), the following restrictions apply.

  • An exit cannot be directly followed by an entry without intermediate fflush , fseek , fsetpos or rewind .
  • An entry cannot be directly followed by an exit without intermediate fseek , fsetpos or rewind or an input operation that meets the end of the file.

I got confused about this. Can anyone explain a bit about this? For example, in what situation does an entry and exit function that violates the above restrictions cause an unexpected program behavior? I assume that the reason for the limitations may be related to buffering in the library, but I'm not so clear.

+7
source share
2 answers

It is not clear what you are asking.

Your main question: "Why does the book say that I can’t do this?" Well, the book says that you cannot do this because POSIX / SUS / etc. the standard says that this behavior is undefined in the fopen specification , which it performs to align with the ISO C standard (working draft N1124, since the final version is not free), 7.19.5.3.

Then you ask: "In what situation does an entry and exit function that violates the above restrictions cause an unexpected behavior of the program?"

Undefined behavior will always lead to unexpected behavior, because the thing is that you have nothing to wait for. (See 3.4.3 and 4 in standard C above.)

But also, it is not even clear that they could indicate what would make sense. Look at this:

 int main(int argc, char *argv[]) { FILE *fp = fopen("foo", "r+"); fseek(fp, 0, SEEK_SET); fwrite("foo", 1, 3, fp); fseek(fp, 0, SEEK_SET); fwrite("bar", 1, 3, fp); char buf[4] = { 0 }; size_t ret = fread(buf, 1, 3, fp); printf("%d %s\n", (int)ret, buf); } 

So, if it prints 3 foo , because it is what is on the disk, or 3 bar , because it is what is in the β€œconcept file”, or 0 , because there is nothing after what was written for you to read in EOF? And if you think that there is an obvious answer, think that it is possible that bar has already turned red - or even that it has partially turned red, so the file on disk now contains boo .

If you ask a more practical question: β€œCan I handle it in some cases?”, I believe that on most Unix platforms, the code above will give you random segfault, but 3 xyz (either 3 uninitialized characters, or in more complex cases 3 characters that ended up in the buffer before it was overwritten) the rest of the time. So no, you can't handle it.

Finally, you say: "I assume that the reason for the restrictions may be related to buffering in the library, but I'm not so clear." It sounds like you are asking about it.

You are right about buffering. As I pointed out above, there is actually no intuitive right thing, but also, think about implementation. Remember, the Unix path has always been "if the simplest and most efficient code is good enough, do it."

There are three ways to implement something like stdio:

  • Use a shared buffer for reading and writing, and write code to switch contexts as needed. It will be a little more complicated and will buffer buffers more often than you would like.
  • Use two separate buffers and a cache style code to determine when to perform one operation and / or invalidate another buffer. This is even more complicated and makes the FILE object take up twice as much memory.
  • Use a shared buffer and simply prevent alternating reads and writes without explicit threads between them. It is dead-simple and as efficient as possible.
  • Use a shared buffer and implicitly hide between read and write interleaving. It is almost as simple and almost as effective, and much safer, but not quite better than security.

So, Unix went with No. 3 and documented it, and SUS, POSIX, C89, etc. standardized this behavior.

You could say, "C'mon, it can't be so inefficient." Well, you must remember that Unix was designed for low-end systems of the late 1970s, and the basic philosophy is that it is not worth trading even small performance if there is no real benefit. But the most important thing is that stdio should handle trivial functions like getc and putc , and not just fancy things like fscanf and fprintf , and adding something to these functions (or macros), which makes them 5x as slow will have tremendous value in a lot of real code.

If you look at modern implementations like * BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least a commercial but shared source), most do the same. Security checks add a few, but usually they give you an error for interleaving, not for implicit flushing. After all, if your code is incorrect, it is better to say that your code is incorrect than to try to execute DWIM.

For example, look at the early Darwin (OS X) fopen , fread and fwrite (chosen because it is beautiful and simple, and has easily linked code that has syntactic color, but also copyable). All fread has to do is copy bytes from the buffer and replenish the buffer if it runs out. You cannot make it easier.

+3
source

You are not allowed to perform input and output operations. For example, you cannot use formatted input to search for a specific point in a file, and then start writing bytes starting at that point. This allows the implementation to assume that at any moment the only I / O buffer will contain only data that needs to be read (to you) or written (to the OS), without any security checks.

 f = fopen( "myfile", "rw" ); /* open for read and write */ fscanf( f, "hello, world\n" ); /* scan past file header */ fprintf( f, "daturghhhf\n" ); /* write some data - illegal */ 

This is normal if you execute fseek( f, 0, SEEK_CUR ); between fscanf and fprintf because it changes the I / O buffer mode without changing it.

Why is this done? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio specification allows for error compatibility, and a working implementation of automatic mode switching simply implements a compatible extension.

+4
source

All Articles