In regex engines in all languages ββthat I'm familiar with, the notation .*
Indicates matching zero or more characters. Consider the following Javascript code:
var s = "baaabcccb"; var pattern = new RegExp("b.*b"); var match = pattern.exec(s); if (match) alert(match);
This displays baaabcccb
The same thing happens with Python:
>>> import re >>> s = "baaabcccb" >>> m = re.search("b.*b", s) >>> m.group(0) 'baaabcccb'
What is the reason that both of these languages ββcorrespond to "baaabcccb"
and not just "baaab"
? The way I read the pattern b.*b
is to "find a substring that starts with b
, then has any number of other characters, and then ends with b
". Both baaab
and baaabcccb
satisfy this requirement, but both Javascript and Python correspond to the latter. I would expect it to match baaab
, simply because this substring satisfies the requirement and appears first.
So why does the pattern match baaabcccb
in this case? And is there a way to change this behavior (in any language) so that it matches baaab
?
source share