Regular expression matches EOF

Question

Regular expression matches EOF

I have some data that look like

john, dave, chris rick, sam, bob joe, milt, paul

I use this regex to match names

 /(\w.+?)(\r\n|\n|,)/

which works for the most part, but the file ends abruptly after the last word, meaning that the last value does not end with \r\n , \n or,, it ends with EOF. Is there a way to match EOF in regex so that I can put it in this second group?

+58

regex

Ryan Jul 23 '09 at 12:01

source share

9 answers

EOF is not really a symbol. If you have a multi-line string, then $ will match the end of the line as well as the end of the line.

In Perl and its brothers, \A and \Z correspond to the beginning and end of a line, completely ignoring line breaks.

The GNU extensions for POSIX regular expressions use \` and \' for the same thing.

+16

paxdiablo Jul 23 '09 at 12:05

source share

In Visual Studio, you can find EOF as follows: $(?![\r\n]) . This works whether your line endings are CR, CRLF or just LF.

As a bonus, you can ensure that all of your code files have the final newline marker, for example:

  Find What: (?<![\r\n])$(?![\r\n]) Replace With: \r\n Use Regular Expressions: checked Look at these file types: *.cs, *.cshtml, *.js

How it works:

Find any end of the line (zero-width match) that is not preceded by CR or LF, and CR or LF does not follow. Some thought will show you why this works!

Note that you must replace your desired line termination character, be it CR, LF, or CRLF.

+7

ErikE Jan 23 '16 at 1:54 on

source share

Ryan's contrasting behavior suggested \ Z with \ z:

 $ perl -we 'my $ corpus = "hello \ n";  $ corpus = ~ s / \ Z / world / g;  print (": $ corpus: \ n") '
 : helloworld
 world:
 $ perl -we 'my $ corpus = "hello \ n";  $ corpus = ~ s / \ z / world / g;  print (": $ corpus: \ n") '
 : hello
 world:
 $

perlre sez:

 \ Z Match only at end of string, or before newline at the end
 \ z Match only at end of string

The translation of the test case into Ruby (1.8.7, 1.9.2) behaves the same way.

+5

Martin Dorey Nov 30 '12 at 18:54

source share

Do you really need to capture line breaks? If not, this regex should be all you need:

 /\w+/

Suppose all the substrings that you want to combine are completely made up of the word characters, for example, in your example.

+2

Alan Moore Jul 23 '09 at 12:33

source share

Maybe try $ (EOL / EOF) instead of (\ r \ n | \ n)?

 /\"(.+?)\".+?(\w.+?)$/

+2

Marc Gravell Jul 23 '09 at 13:34

source share

Assuming you use the correct modifier formatting to handle the whole line (and not line by line), and if \ n works for you, you use it), just add another alternative - end of line: (\ r \ n | \ n |, | $)

+1

leafnode Jul 23 '09 at 12:04

source share

/(\w.+?)(\r\n|\n|,|$)/

0

cube Jul 23 '09 at 12:04

source share

I recently searched for something like this, but for JavaScript.

Put it here so that anyone who has the same problem can win

 var matchEndOfInput = /$(?![\r\n])/gm;

Basically this will correspond to the end of the line, followed by no carriage return or new line characters. This is essentially the same as \Z , but for JavaScript.

0

Zlatin Zlatev Nov 20 '17 at 13:29

source share

Ryan · Accepted Answer · 2009-07-23 12:03

The answer to this question \Z made me sort it out a bit, but it works now. Note that, on the contrary, \A matches the beginning of the entire line (unlike ^ and $ , corresponding to the beginning of one line).

Regular expression matches EOF

More articles: