In regex engines in all languages ββthat I'm familiar with, the notation .* Indicates matching zero or more characters. Consider the following Javascript code:
var s = "baaabcccb"; var pattern = new RegExp("b.*b"); var match = pattern.exec(s); if (match) alert(match);
This displays baaabcccb
The same thing happens with Python:
>>> import re >>> s = "baaabcccb" >>> m = re.search("b.*b", s) >>> m.group(0) 'baaabcccb'
What is the reason that both of these languages ββcorrespond to "baaabcccb" and not just "baaab" ? The way I read the pattern b.*b is to "find a substring that starts with b , then has any number of other characters, and then ends with b ". Both baaab and baaabcccb satisfy this requirement, but both Javascript and Python correspond to the latter. I would expect it to match baaab , simply because this substring satisfies the requirement and appears first.
So why does the pattern match baaabcccb in this case? And is there a way to change this behavior (in any language) so that it matches baaab ?
source share