Does awk CR LF do cygwin processing?

On Linux, this works as expected:

$ echo -e "line1\r\nline2"|awk -v RS="\r\n" '/^line/ {print "awk: "$0}' awk: line1 awk: line2 

But under the windows, \ r is discarded (awk counts this one line):

Window:

 $ echo -e "line1\r\nline2"|awk -v RS="\r\n" '/^line/ {print "awk: "$0}' awk: line1 line2 

Windows GNU Awk 4.0.1 Linux GNU Awk 3.1.8

EDIT from @EdMorton (sorry if this is an unwanted add-on, but I think maybe this helps demonstrate the problem):

Consider this setup and RS input (on cygwin):

 $ awk 'BEGIN{printf "\"%s\"\n", RS}' | cat -v " " $ echo -e "line1\r\nline2" | cat -v line1^M line2 

This is Solaris with gawk:

 $ echo -e "line1\r\nline2" | awk '1' | cat -v line1^M line2 

and this is cygwin with gawk:

 $ echo -e "line1\r\nline2" | awk '1' | cat -v line1 line2 

RS was just the default for a new line, so where did the -M control go to cygwin?

+9
linux bash awk
source share
2 answers

I just checked with Arnold Robbins (the gawk provider), and the answer is that this is something done by the C libraries, and to stop this, you have to set the awk BINMODE variable to 3:

 $ echo -e "line1\r\nline2" | awk '1' | cat -v line1 line2 $ echo -e "line1\r\nline2" | awk -v BINMODE=3 '1' | cat -v line1^M line2 

See the manual page for more information.

+7
source share

The problem seems to be related to awk defined in Cygwin.
I tried several different things, and it seems that awk silently handles replacing \r\n with \n in the input.

If we just ask awk repeat the text unchanged, it will "sanitize" the carriage return without asking:

 $ echo -e "line1\r\nline2" | od -a 0000000 line 1 cr nl line 2 nl 0000015 $ echo -e "line1\r\nline2" | awk '{ print $0; }' | od -a 0000000 line 1 nl line 2 nl 0000014 

However, it will leave other carriage messages inapplicable:

 $ echo -e "Test\rTesting\r\nTester\rTested" | awk '{ print $0; }' | od -a 0000000 T est cr T esting nl T es 0000020 ter cr T ested nl 0000033 

Using a custom _ separator of records _ ended up leaving the carriage left unchanged:

 $ echo -e "Testing\r_Tested" | awk -v RS="_" '{ print $0; }' | od -a 0000000 T esting cr nl T ested nl 0000020 nl 0000021 

The most striking example includes \r\n in the data, but not as a record separator:

 $ echo -e "Testing\r\nTested_Hello_World" | awk -v RS="_" '{ print $0; }' | od -a 0000000 T esting nl T ested nl H 0000020 ello nl W orld nl nl 0000034 

awk blindly converts \r\n to \n into input, even if we didn't ask for it.

This replacement seems to occur before applying record separation, which explains why RS="\r\n" never matches anything. By the time awk searches for \r\n , it has already replaced it with \n in the input.

+4
source share

All Articles