Python language border with unexpected results

import re sstring = "ON Any ON Any" regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE) regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) for a in regex1.findall(sstring): print(a) print("----------") for a in regex2.findall(sstring): print(a) print("----------") for a in regex3.findall(sstring): print(a) print("----------") 

('ON', '') ('', '') ('', 'Any') ('', '') ('ON', '') ('', '') ('', ' Any ')

('', '')

ABOUT

Any

ABOUT

Any


After reading many articles on the Internet and SO, I think I still do not understand the boundary of the regular expression word: \b

The first regular expression does not give the expected result, I think it should give me "ON Any On Any", but it still does not give me this.

The second regular expression gives me tuples, and I don’t know why or understand the meaning: ('', '')

Third regexp displays results on highlighted lines and empty lines in betweens

Could you help me figure this out.

+1
python regex word-boundary
source share
1 answer

Note that to match ON ANY you need to add escaped (since you use the re.VERBOSE flag) between ON and ANY as the \b word boundary, which is a zero-width statement does not use any text, just sets the position between specific characters. That is why your first unsuccessful approach is re.compile(r''' \bON\bANY\b''', re.VERBOSE) .

Using

 rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE) 

See Python Demo

re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) returns tuples since you defined (...) group capturing in the template.

re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) matches the optional sequences: ON or ANY , so you get these words as values. You also get empty values ​​because this regular expression can only match the word boundary (all other subpatterns are optional).

More about word boundaries:

+1
source share

All Articles