Fgetpos () behavior depends on newline

Consider these two files:

file1.txt (Windows newline)

abc\r\n def\r\n 

file2.txt (Unix newline)

 abc\n def\n 

I noticed that for file2.txt, the position obtained with fgetpos is not increasing correctly. I am working on windows.

Let me show you an example. The following code:

 #include<cstdio> void read(FILE *file) { int c = fgetc(file); printf("%c (%d)\n", (char)c, c); fpos_t pos; fgetpos(file, &pos); // save the position c = fgetc(file); printf("%c (%d)\n", (char)c, c); fsetpos(file, &pos); // restore the position - should point to previous c = fgetc(file); // character, which is not the case for file2.txt printf("%c (%d)\n", (char)c, c); c = fgetc(file); printf("%c (%d)\n", (char)c, c); } int main() { FILE *file = fopen("file1.txt", "r"); printf("file1:\n"); read(file); fclose(file); file = fopen("file2.txt", "r"); printf("\n\nfile2:\n"); read(file); fclose(file); return 0; } 

gives the following result:

 file1: a (97) b (98) b (98) c (99) file2: a (97) b (98) (-1) (-1) 

file1.txt works as expected, while file2.txt behaves strangely. To explain what was wrong with him, I tried the following code:

 void read(FILE *file) { int c; fpos_t pos; while (1) { fgetpos(file, &pos); printf("pos: %d ", (int)pos); c = fgetc(file); if (c == EOF) break; printf("c: %c (%d)\n", (char)c, c); } } int main() { FILE *file = fopen("file1.txt", "r"); printf("file1:\n"); read(file); fclose(file); file = fopen("file2.txt", "r"); printf("\n\nfile2:\n"); read(file); fclose(file); return 0; } 

I got this conclusion:

 file1: pos: 0 c: a (97) pos: 1 c: b (98) pos: 2 c: c (99) pos: 3 c: (10) pos: 5 c: d (100) pos: 6 c: e (101) pos: 7 c: f (102) pos: 8 c: (10) pos: 10 file2: pos: 0 c: a (97) // something is going wrong here... pos: -1 c: b (98) pos: 0 c: c (99) pos: 1 c: (10) pos: 3 c: d (100) pos: 4 c: e (101) pos: 5 c: f (102) pos: 6 c: (10) pos: 8 

I know that fpos_t not intended to be interpreted by the encoder, as it is implementation dependent. However, the above example explains the problems with fgetpos / fsetpos .

How is it possible that a newline sequence affects the internal position of a file, even before it encounters these characters?

+7
source share
2 answers

I would say that the problem is probably caused by the fact that the second file confuses the implementation, since it opens in text mode, but does not meet the requirements.

In standard

A text stream is an ordered sequence of characters composed in a line, each line consisting of zero or more characters plus the completion of a newline character

Your second file stream has no valid newline characters (since it searches \r\n to convert to a newline character inside). As a result, the implementation may not understand the length of the string properly and become hopelessly confused when you try to move in it.

Besides,

Characters can be added, changed, or deleted at the input and output to comply with various conventions for representing text in a host environment.

Keep in mind that the library will not just read every byte from the file when you call fgetc - it will read the entire file (for one so small) into the stream buffer and work with it.

+3
source

I am adding this as supporting information for teppic answer :

When working with FILE* , which was opened as text instead of binary code, the fgetpos() function in VC ++ 11 (VS 2012) can (and for your example file2.txt ) end in this code fragment

 // ... if (_osfile(fd) & FTEXT) { /* (1) If we're not at eof, simply copy _bufsiz onto rdcnt to get the # of untranslated chars read. (2) If we're at eof, we must look through the buffer expanding the '\n' chars one at a time. */ // ... if (_lseeki64(fd, 0i64, SEEK_END) == filepos) { max = stream->_base + rdcnt; for (p = stream->_base; p < max; p++) if (*p == '\n') // <--- /* adjust for '\r' */ // <--- rdcnt++; // <--- // ... 

It is assumed that any character \n in the buffer was originally a sequence \r\n , which was normalized when the data was read in the buffer. Therefore, there are times when he tries to take into account this (now missing) \r character, which, in his opinion, deleted the previous file processing from the buffer. This particular setting occurs when you are near the end of the file; however, there are other similar settings for accounting for remote \r bytes in fgetpos() processing.

+2
source

All Articles