Python language border with unexpected results

Question

Python language border with unexpected results

import re sstring = "ON Any ON Any" regex1 = re.compile(r''' \bON\bANY\b''', re.VERBOSE) regex2 = re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) regex3 = re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) for a in regex1.findall(sstring): print(a) print("----------") for a in regex2.findall(sstring): print(a) print("----------") for a in regex3.findall(sstring): print(a) print("----------")

('ON', '') ('', '') ('', 'Any') ('', '') ('ON', '') ('', '') ('', ' Any ')
('', '')
ABOUT
Any
ABOUT
Any

After reading many articles on the Internet and SO, I think I still do not understand the boundary of the regular expression word: \b

The first regular expression does not give the expected result, I think it should give me "ON Any On Any", but it still does not give me this.

The second regular expression gives me tuples, and I don’t know why or understand the meaning: ('', '')

Third regexp displays results on highlighted lines and empty lines in betweens

Could you help me figure this out.

+1

python regex word-boundary

Ahmad Kamal ELSaman Oct 05 '16 at 13:41

source share

1 answer

Wiktor stribiżew · Accepted Answer · 2016-10-05T14:04:01+0000

Note that to match ON ANY you need to add escaped (since you use the re.VERBOSE flag) between ON and ANY as the \b word boundary, which is a zero-width statement does not use any text, just sets the position between specific characters. That is why your first unsuccessful approach is re.compile(r''' \bON\bANY\b''', re.VERBOSE) .

Using

 rx = re.compile(r''' \bON\ ANY\b ''', re.VERBOSE|re.IGNORECASE)

See Python Demo

re.compile(r'''\b(ON)?\b(Any)?''', re.VERBOSE) returns tuples since you defined (...) group capturing in the template.

re.compile(r'''\b(?:ON)?\b(?:Any)?''', re.VERBOSE) matches the optional sequences: ON or ANY , so you get these words as values. You also get empty values because this regular expression can only match the word boundary (all other subpatterns are optional).

More about word boundaries:

Word borders in Regular-Expressions.info
Java Regex Word Boundaries (this is also a regular expression border word, also applicable here)

Python language border with unexpected results

('', '')

More articles: