Nongreedy regex is greedy

I have the following text

tooooooooooooon 

According to this book, I read when ? follows any quantifier, he becomes not greedy.

My regex to*?n still returns tooooooooooooon .

It should return ton right?

Any idea why?

+7
regex regex-greedy
source share
5 answers

A regular expression can only match a fragment of existing text.

Since the substring β€œton” does not exist anywhere on your string, it cannot be the result of a match. The match returns only a substring of the source string

EDIT: To be clear, if you used the line below, with an extra "n"

 toooooooonoooooon 

it is a regular expression (which does not indicate "o")

 t.*n 

will match the following (as many characters as possible before "n")

 toooooooonoooooon 

but regex

 t.*?n 

will match only the following (as few characters as possible before "n")

 toooooooon 
+46
source share

Regular expression always tends to match.

Your expression says the following:

 A 't', followed by * as few as possible * 'o's, followed by a' n '.

This means that any of the necessary will be agreed, because at the end there is an "n" to which the expression seeks to achieve. Reconciling all o is only an opportunity for success.

+5
source share

Regexps is trying to combine everything in them. Since for coincidence n must not match than every o in toooon, everything is consistent. Also, because you use o *? instead of o +? you do not need to attend o.

Example in Perl

 $a = "toooooo"; $b = "toooooon"; if ($a =~ m/(to*?)/) { print $1,"\n"; } if ($b =~ m/(to*?n)/) { print $1,"\n"; } ~>perl ex.pl t toooooon 
+4
source share

Regex always does its best to fit. The only thing you do in this case is to slow down your parser by returning it back to /o*?/ Node. Once for each 'o' in "tooooon" . Whereas in normal matching the first time you need as many 'o' s as possible. Since the next element to match is 'n' , which will not match 'o' , it makes little sense to try to use minimal matching. In fact, when a normal match fails, it takes quite a while to complete. It should return every 'o' until there is nothing left to retreat. In this case, I would use the maximum matching /to*+n/ . 'o' would take everything he could and never give it back. This would make it so that when he fails, he will work quickly.

Minimum RE:

 'toooooon' ~~ / to *? n /

  toooooon
 {t} match [t]
 [t] match [o] 0 times
 [t] <n> fail to match [n] -> retry [o]
 [t] {o} match [o] 1 times
 [t] [o] <n> fail to match [n] -> retry [o]
 [t] [o] {o} match [o] 2 times
 [t] [o] [o] <n> fail to match [n] -> retry [o]

 .  .  .  .

 [t] [o] [o] [o] [o] {o} match [o] 5 times
 [t] [o] [o] [o] [o] [o] <n> fail to match [n] -> retry [o]
 [t] [o] [o] [o] [o] [o] {o} match [o] 6 times
 [t] [o] [o] [o] [o] [o] [o] {n} match [n]

Normal RE:

(NOTE: similar to Maximal RE)

 'toooooon' ~~ / to * n /

  toooooon
 {t} match [t]
 [t] {o} {o} {o} {o} {o} {o} match [o] 6 times
 [t] [o] [o] [o] [o] [o] [o] {n} match [n]

Minimum RE Failure:

 'toooooo' ~~ / to *? n /

  toooooo

 .  .  .  .

 .  .  .  .

 [t] [o] [o] [o] [o] {o} match [o] 5 times
 [t] [o] [o] [o] [o] [o] <n> fail to match [n] -> retry [o]
 [t] [o] [o] [o] [o] [o] {o} match [o] 6 times
 [t] [o] [o] [o] [o] [o] [o] <n> fail to match [n] -> retry [o]
 [t] [o] [o] [o] [o] [o] [o] <o> fail to match [o] 7 times -> match failed

Failure of normal RE:

 'toooooo' ~~ / to * n /

  toooooo
 {t} match [t]
 [t] {o} {o} {o} {o} {o} {o} match [o] 6 times
 [t] [o] [o] [o] [o] [o] [o] <n> fail to match [n] -> retry [o]
 [t] [o] [o] [o] [o] [o] match [o] 5 times
 [t] [o] [o] [o] [o] [o] <n> fail to match [n] -> retry [o]

 .  .  .  .

 [t] [o] match [o] 1 times
 [t] [o] <o> fail to match [n] -> retry [o]
 [t] match [o] 0 times
 [t] <n> fail to match [n] -> match failed

Maximum RE Failure:

 'toooooo' ~~ / to * + n /

  toooooo
 {t} match [t]
 [t] {o} {o} {o} {o} {o} {o} match [o] 6 times
 [t] [o] [o] [o] [o] [o] [o] <n> fail to match [n] -> match failed
+4
source share

The line you are looking for (haystack) does not contain the string "ton".

However, it contains the substring "tooooooooooooon".

+2
source share

All Articles