Python: best practice for dynamically building regex

Question

Python: best practice for dynamically building regex

I have a simple function to remove a word from some text:

def remove_word_from(word, text): if not text or not word: return text rec = re.compile(r'(^|\s)(' + word + ')($|\s)', re.IGNORECASE) return rec.sub(r'\1\3', text, 1)

The problem, of course, is that if a word contains characters like "(" or ")" things break, and it usually seems unsafe to stick with a random word in the middle of a regular expression.

What is the best practice for handling such cases? Is there a convenient, safe function that I can call to avoid a “word” so that it can be used safely?

+6

python regex

Parand Jan 26 '11 at 16:35

source share

3 answers

If you do not use regular expressions, can you not use the replace method for strings?

 text = text.replace(word, '')

This allows you to get rid of spelling problems.

0

Emmanuel Jan 26 '11 at 16:49

source share

Write the function of the disinfectant and first skip the word.

 def sanitize(word): def literalize(wd, escapee): return wd.replace(escapee, "\\%s"%escapee) return reduce(literalize, "()[]*?{}.+|", word) def remove_word_from(word, text): if not text or not word: return text rec = re.compile(r'(^|\s)(' + sanitize(word) + ')($|\s)', re.IGNORECASE) return rec.sub(r'\1\3', text, 1)

-one

Ishpeck Jan 26 '11 at 16:47

source share

Vlad H · Accepted Answer · 2011-01-26T16:43:19+0000

You can use re.escape(word) to avoid the word.

Python: best practice for dynamically building regex

More articles: