Python: best practice for dynamically building regex

I have a simple function to remove a word from some text:

def remove_word_from(word, text): if not text or not word: return text rec = re.compile(r'(^|\s)(' + word + ')($|\s)', re.IGNORECASE) return rec.sub(r'\1\3', text, 1) 

The problem, of course, is that if a word contains characters like "(" or ")" things break, and it usually seems unsafe to stick with a random word in the middle of a regular expression.

What is the best practice for handling such cases? Is there a convenient, safe function that I can call to avoid a β€œword” so that it can be used safely?

+6
python regex
source share
3 answers

You can use re.escape(word) to avoid the word.

+21
source share

If you do not use regular expressions, can you not use the replace method for strings?

 text = text.replace(word, '') 

This allows you to get rid of spelling problems.

0
source share

Write the function of the disinfectant and first skip the word.

 def sanitize(word): def literalize(wd, escapee): return wd.replace(escapee, "\\%s"%escapee) return reduce(literalize, "()[]*?{}.+|", word) def remove_word_from(word, text): if not text or not word: return text rec = re.compile(r'(^|\s)(' + sanitize(word) + ')($|\s)', re.IGNORECASE) return rec.sub(r'\1\3', text, 1) 
-one
source share

All Articles