How to make the border of the word \ b not match the dash

I have simplified my code for the specific problem I am facing.

import re pattern = re.compile(r'\bword\b') result = pattern.sub(lambda x: "match", "-word- word") 

I get

 '-match- match' 

but I want

 '-word- match' 

edit:

Or for the string "word -word-"

I want to

 "match -word-" 
+6
source share
3 answers

\b basically denotes a word boundary on characters other than [a-zA-Z0-9_] , which also includes spaces. Surround word with negative backward mapping to ensure that there is no non-spatial character after and before it:

 re.compile(r'(?<!\S)word(?!\S)') 
+2
source

What you need is a negative look.

 pattern = re.compile(r'(?<!-)\bword\b') result = pattern.sub(lambda x: "match", "-word- word") 

To bring the documentation:

(?<!...) Matches if the current position in the line does not precede the match for ....

Thus, this will only match if the word break \b not preceded by a minus sign.

If you need this for the end of the line, you will have to use a negative lookahead, which will look like this: (?!-) . A full regex will result in the following: (?<!-)\bword(?!-)\b

+6
source

Instead of word boundaries, you can also match the character before and after the word using the pattern (\s|^) and (\s|$) .

Breakdown : \s corresponds to each space character, which seems to be what you are trying to achieve as you exclude dashes. ^ and $ ensure that if a word is either the first or the last in a line (i.e., No character before or after), they also match.

Your code will look something like this:

 pattern = re.compile(r'(\s|^)(word)(\s|$)') result = pattern.sub(r"\1match\3", "-word- word") 

Since this solution uses character classes such as \s , this means that they can be easily replaced or extended. For example, if you want your words to be separated by spaces or commas, your template will look something like this: r'(,|\s|^)(word)(,|\s|$)' .

0
source

All Articles