Divide by \ b if your regex engine doesn't support it

How can I divide by word boundary in a regular expression engine that does not support it?

python re may match \ b, but does not seem to support its separation. I seem to recall that I was dealing with other regex engines that had the same limitations.

input example:

"hello, foo" 

expected output:

 ['hello', ', ', 'foo'] 

python actual output:

 >>> re.compile(r'\b').split('hello, foo') ['hello, foo'] 
+4
source share
5 answers

(\ W +) can give you the expected result:

 >>> re.compile(r'(\W+)').split('hello, foo') ['hello', ', ', 'foo'] 
+9
source

You can also use re.findall () for this:

 >>> re.findall(r'.+?\b', 'hello, foo') ['hello', ', ', 'foo'] 
+2
source

OK I understood:

Place the separation pattern in the capture parser and will be included in the output. You can use either \ w + or \ W +:

 >>> re.compile(r'(\w+)').split('hello, foo') ['', 'hello', ', ', 'foo', ''] 

To get rid of empty results, pass it through filter () with None as a filter function that will filter everything that does not evaluate to true:

 >>> filter(None, re.compile(r'(\w+)').split('hello, foo')) ['hello', ', ', 'foo'] 

Edit: CMS indicates that if you use \ W + you do not need to use filter ()

+1
source

Try

 >>> re.compile(r'\W\b').split('hello, foo') ['hello,', 'foo'] 

This is split into a non-word spelled before the border. In your example, there is nothing to split.

0
source

Interesting. So far, most of the RE engines I've tried have performed this split.

I played a little and found that re.compile(r'(\W+)').split('hello, foo') gives the expected result ... Not sure if this is true.

0
source

All Articles