How to make the border of the word \ b not match the dash

Question

How to make the border of the word \ b not match the dash

I have simplified my code for the specific problem I am facing.

import re pattern = re.compile(r'\bword\b') result = pattern.sub(lambda x: "match", "-word- word")

I get

 '-match- match'

but I want

 '-word- match'

edit:

Or for the string "word -word-"

I want to

 "match -word-"

+6

python regex

alpalalpal 25 sept. '16 at 8:42

source share

3 answers

What you need is a negative look.

 pattern = re.compile(r'(?<!-)\bword\b') result = pattern.sub(lambda x: "match", "-word- word")

To bring the documentation:

(?<!...) Matches if the current position in the line does not precede the match for ....

Thus, this will only match if the word break \b not preceded by a minus sign.

If you need this for the end of the line, you will have to use a negative lookahead, which will look like this: (?!-) . A full regex will result in the following: (?<!-)\bword(?!-)\b

+6

Matthias 25 sept. '16 at 8:54

source share

Instead of word boundaries, you can also match the character before and after the word using the pattern (\s|^) and (\s|$) .

Breakdown : \s corresponds to each space character, which seems to be what you are trying to achieve as you exclude dashes. ^ and $ ensure that if a word is either the first or the last in a line (i.e., No character before or after), they also match.

Your code will look something like this:

 pattern = re.compile(r'(\s|^)(word)(\s|$)') result = pattern.sub(r"\1match\3", "-word- word")

Since this solution uses character classes such as \s , this means that they can be easily replaced or extended. For example, if you want your words to be separated by spaces or commas, your template will look something like this: r'(,|\s|^)(word)(,|\s|$)' .

0

nikitautiu 25 sept. '16 at 9:13

source share

revo · Accepted Answer · 2016-09-25T09:02:50+0000

\b basically denotes a word boundary on characters other than [a-zA-Z0-9_] , which also includes spaces. Surround word with negative backward mapping to ensure that there is no non-spatial character after and before it:

 re.compile(r'(?<!\S)word(?!\S)')

How to make the border of the word \ b not match the dash

More articles: