Why does redirection work where the pipeline fails?

Theoretically, these two teams should be equivalent:

1

type tmp.txt | test.exe 

2

 test.exe < tmp.txt 

I have a process involving # 1 that has worked just fine for many years; at some point over the past year, we started compiling a program with a newer version of Visual Studio, and now it fails due to incorrect input (see below). But No. 2 is successful (without exception, and we see the expected result). Why will # 2 succeed if # 1 fails?

I was able to reduce test.exe in the program below. Our input file has exactly one tab per line and evenly uses CR / LF line endings. Therefore, this program should never write to stderr:

 #include <iostream> #include <string> int __cdecl main(int argc, char** argv) { std::istream* pIs = &std::cin; std::string line; int lines = 0; while (!(pIs->eof())) { if (!std::getline(*pIs, line)) { break; } const char* pLine = line.c_str(); int tabs = 0; while (pLine) { pLine = strchr(pLine, '\t'); if (pLine) { // move past the tab pLine++; tabs++; } } if (tabs > 1) { std::cerr << "We lost a linebreak after " << lines << " good lines.\n"; lines = -1; } lines++; } return 0; } 

When starting through # 1, I get the following output with the same numbers every time (in each case, this is because getline returns two concatenated strings without an intermediate line); when running through # 2 (correctly) there is no output:

 We lost a linebreak after 8977 good lines. We lost a linebreak after 1468 good lines. We lost a linebreak after 20985 good lines. We lost a linebreak after 6982 good lines. We lost a linebreak after 1150 good lines. We lost a linebreak after 276 good lines. We lost a linebreak after 12076 good lines. We lost a linebreak after 2072 good lines. We lost a linebreak after 4576 good lines. We lost a linebreak after 401 good lines. We lost a linebreak after 6428 good lines. We lost a linebreak after 7228 good lines. We lost a linebreak after 931 good lines. We lost a linebreak after 1240 good lines. We lost a linebreak after 2432 good lines. We lost a linebreak after 553 good lines. We lost a linebreak after 6550 good lines. We lost a linebreak after 1591 good lines. We lost a linebreak after 55 good lines. We lost a linebreak after 2428 good lines. We lost a linebreak after 1475 good lines. We lost a linebreak after 3866 good lines. We lost a linebreak after 3000 good lines. 
+2
source share
1 answer

This turns out to be a known issue :

The error is actually in the lower level _read function, which stdio library functions (including both fread and fgets) use to read from a file descriptor.

The error in _read is as follows: If ...

  • you are reading from text mode,
  • you call _read to read N bytes,
  • _read successfully reads N bytes and
  • read by the last byte is a carriage return (CR) character,

then the _read function will finish reading, but will return N-1 instead of N. The CR or LF character at the end of the result buffer is not taken into account in the returned value.

In the specific release referenced in this error, fread calls _read to populate the stream buffer. _read reports that it has filled the N-1 bytes buffer and the final character CR or LF is lost.

The error mainly depends on the time, because whether _read can successfully read N bytes from the channel depends on how much data has been written to the pipe. Changing the buffer size or changing the buffer is reset, can reduce the likelihood of a problem, but this will not necessarily work around the problem in 100% of cases.

There are several possible workarounds:

  • use the binary channel and execute text mode CRLF => LF translation manually from the reader side. This is not particularly difficult to do (scanning the buffer for CRLF pairs, replacing them with one LF).
  • calling ReadFile with _osfhnd (fh), completely bypassing the CRT I / O library on the reader side (although this will also require manual text translation mode, since the OS will not execute text mode for you)

We fixed this error for the next Universal CRT update. Note that the universal CRT is a component of the operating system and is maintained independently of the Visual C ++ libraries. The next update to Universal CRT is likely to be around the same time span as the Windows 10 Update this summer.

+3
source

All Articles