Why perl regex '*?' to remain greedy?

I run a simple program:

my $_ = '/login/.htaccess/.htdf'; s!(/\.ht.*?)$!/!; print "$_ $1"; 

OUT
/login/ /.htaccess/.htdf

I want this regular expression to match only /.htdf .

Example 2:

 my $_ = 'abcbc'; m/(b.*?)$/; print "$_ $1\n"; 

OUT
abcbc bcbc

I expect bc .

Why *? still greedy? (I want a minimal match.)

+8
regex perl
source share
4 answers

Atoms are matched sequentially, and each atom after the first should coincide at the position where the previous atom left a match. (The first atom is implicitly preceded by \A(?s:.)*? ) This means that .* / .*? cannot decide where he begins to compare; he only decides where he ceases to fit.

Example 1

This is not greed. \.ht displays the match at position 10, and at position 10 the minimum .*? may match and still contain the remaining match with the access/.htdf . In fact, this is the only thing .*? may match at position 10 and still have the rest of the pattern combination.

I think you want to remove this last part of the path if it starts with .ht , leaving the previous / in place. To do this, you can use one of the following actions:

 s{/\.ht[^/]*$}{/} 

or

 s{/\K\.ht[^/]*$}{} 

Example 2

This is not greed. b displays the match at position 2, and at position 2 the minimum .*? may match and still contain the remaining match with the cbc pattern. In fact, this is the only thing .*? can match at position 2 and still have the rest of the pattern matching.

You may be looking for

 /b[^b]*$/ 

or

 /b(?:(?!b).)*$/ # You'd use this if "b" was really more than one char. 
+8
source share

You can use a negative result for this:

 ~/(\.ht(?!.*\.ht).*)$~ 

RegEx Demo

(?!.*\.ht) is a negative result, which ensures that .ht does not occur .ht , so that it matches only the last .ht .

.*? will not be greedy if after that there will be some scheme.

The code:

 $str = '/login/.htaccess/.htdf'; $str =~ s~/(\.ht(?!.*\.ht).*)$~/~m; print "$str\n"; 
+1
source share

Why not? Greed is in the forward direction, not backward. In non-greedy mode, the state machine begins to match and performs a check at each step, not only chews everything, and then returns, but this does not guarantee you a "minimum match".

You might want to avoid matching / ? Like in s{/\.ht[^/]*$}{/} .

+1
source share

The regular expression works the way you created it.
But if you want to use the dot metacharacter, it must be greedy.

This should work s!.*/\K\.ht.*$!! This basically completes the end of .ht...

If you want to be specific, you will need s!/\K\.htdf$!!

0
source share

All Articles