Different behavior of Ctrl-D (Unix) and Ctrl-Z (Windows)

According to the headline, I'm trying to understand the exact behavior of Ctrl + D / Ctrl + Z in a while loop with get (which I should use). The code I'm testing is as follows:

#include <stdio.h> #include <stdlib.h> int main() { char str[80]; while(printf("Insert string: ") && gets(str) != NULL) { puts(str); } return 0; } 

If my input is just Ctrl + D (or Ctrl + Z on Windows) gets returns NULL, and the program exits correctly. It is unclear when I insert something like house^D^D (Unix) or house^Z^Z\n (Windows).

  • In the first case, my interpretation of getchar (or something similar inside the gets function) waits for reading () to get the input, the first Ctrl + D flushes the buffer, which is not empty (therefore, not EOF), then the second time read () is called EOF.
  • In the second case, I noticed that the first Ctrl + Z is inserted into the buffer, and everything that follows is simply ignored. Therefore, my understanding is the first call to read (), inserted by house^Z and discarded everything else, returning 5 (the number of characters read). (I say 5, because otherwise I think that simple Ctrl + Z should return 1 without starting EOF). Then the program expects more input from the user, therefore, the second call to read ().

I would like to know that I will get it right and not how it works, and how much of it just depends on the implementation, if any.


In addition, I noticed that in both Unix and Windows, even after starting EOF, it looks like reset to false in the next gets() call, and I don’t understand why this happens and in which line of code.

I would really appreciate any help.


(12/20/2016) I heavily edited my question to avoid confusion

+7
c windows unix stdin
source share
1 answer

The CTRL-D and CTRL-Z end-of-file indicators serve a similar purpose for Unix and Windows systems, respectively, but are implemented in a completely different way.

On Unix systems (including Unix clones such as Linux), CTRL-D, officially described as the end-of-file character, is actually a delimiter character. This is almost the same as the end of line character (usually a carriage return or CTRL-M), which is used to separate lines. Both characters tell the operating system that the input line is complete and that it is available to the program. The only difference is that with the line ending character, the line ending character (CTRL-J) is inserted at the end of the input buffer to mark the end of the line, while the end of line character is not inserted.

This means that when entering house^D^D on Unix, the read system call first returns a buffer of length 5 with 5 house characters in it. When read is called again to get more input, it then returns a buffer of length 0 with no characters in it. Since the zero length read in the normal file indicates that the end of the file has been reached, the library function gets also interprets this as the end of the file and stops reading input. However, since it fills the buffer with 5 characters, it does not return NULL to indicate that it has reached the end of the file. And since it actually has not reached the end of the file, since the terminal devices are not actually files, further gets calls will then make further read calls that will return any subsequent characters that the user types.

On Windows, CTRL-Z is handled differently. The biggest difference is that it is not processed at all by the operating system at all. When you type house^Z^Z^M on Windows, only the carriage return character receives special handling. As with Unix, carriage return makes the typed line available to the program, although in this case, carriage return and line feed are added to the buffer to mark the end of the line. Thus, as a result, the ReadFile function returns a 9-byte buffer with 9 characters house^Z^Z^M^J

In fact, the program itself, in particular, the C time library, which specifically handles CTRL-Z. In the case of the Microsoft C runtime library, when it sees the CTRL-Z character in the buffer returned by ReadFile , it treats it as an end-of-file marker and ignores everything else after it. Using the example in the previous paragraph, gets ends the ReadFile call to get more input, because the fact that its CTRL-Z character is not remembered when reading from the console (or another device), and it doesn’t have but seen the end of the line (which was ignored). If you press Enter again, gets will return with a buffer filled with 7 bytes house^Z\0 (adding 0 bytes to indicate the end of the line). By default, it does the same when reading from regular files, if the CTRL-Z character appears in the file, it is all after ignoring it. This is for backward compatibility with CP / M, which only supports files that were multiples of 128 in length and used CTRL-Z to mark where the text files should end.

Note that the behavior of Unix and Windows described above is the usual standard user input processing. Unix CTRL-D processing occurs only when reading from the terminal in canonical mode, and it is possible to change the "end of file" character to something else. On Windows, the operating system never processes CTRL-Z on purpose, but whether the C execution library is running or not depends on whether the FILE stream is read in text or binary mode. This is why in portable programs you should always include the b character in the mode line when opening binary files (for example, fopen("foo.gif", "rb") ).

+1
source share

All Articles