Divide by \ b if your regex engine doesn't support it

Question

Divide by \ b if your regex engine doesn't support it

How can I divide by word boundary in a regular expression engine that does not support it?

python re may match \ b, but does not seem to support its separation. I seem to recall that I was dealing with other regex engines that had the same limitations.

input example:

"hello, foo"

expected output:

 ['hello', ', ', 'foo']

python actual output:

 >>> re.compile(r'\b').split('hello, foo') ['hello, foo']

+4

python regex

ʞɔıu Dec 29 '09 at 20:22

source share

5 answers

You can also use re.findall () for this:

 >>> re.findall(r'.+?\b', 'hello, foo') ['hello', ', ', 'foo']

+2

Pez Dec 29 '09 at 21:41

source share

OK I understood:

Place the separation pattern in the capture parser and will be included in the output. You can use either \ w + or \ W +:

 >>> re.compile(r'(\w+)').split('hello, foo') ['', 'hello', ', ', 'foo', '']

To get rid of empty results, pass it through filter () with None as a filter function that will filter everything that does not evaluate to true:

 >>> filter(None, re.compile(r'(\w+)').split('hello, foo')) ['hello', ', ', 'foo']

Edit: CMS indicates that if you use \ W + you do not need to use filter ()

+1

ʞɔıu Dec 29 '09 at 20:39

source share

Try

 >>> re.compile(r'\W\b').split('hello, foo') ['hello,', 'foo']

This is split into a non-word spelled before the border. In your example, there is nothing to split.

0

gnud Dec 29 '09 at 20:31

source share

Interesting. So far, most of the RE engines I've tried have performed this split.

I played a little and found that re.compile(r'(\W+)').split('hello, foo') gives the expected result ... Not sure if this is true.

0

Philho Dec 29 '09 at 20:39

source share

CMS · Accepted Answer · 2008-12-29T20:38:26+0000

(\ W +) can give you the expected result:

 >>> re.compile(r'(\W+)').split('hello, foo') ['hello', ', ', 'foo']

Divide by \ b if your regex engine doesn't support it

More articles: