Remove all occurrences of words in a string from python list

Question

Remove all occurrences of words in a string from python list

I try to match and remove all words from a list from a string using a compiled regular expression, but I try to avoid appearing in words.

Current:

REMOVE_LIST = ["a", "an", "as", "at", ...] remove = '|'.join(REMOVE_LIST) regex = re.compile(r'('+remove+')', flags=re.IGNORECASE) out = regex.sub("", text)

Q: "A quick brown fox jumped over ant"

Out: "fast brown fox jumped over t"

Expected: "brown fox spread quickly"

I tried to change the line for compilation with the following, but to no avail:

  regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

Any suggestions or am I missing something brightly obvious?

+6

python regex

Ogre Mar 15 '13 at 15:06

source share

2 answers

Here is a suggestion without using a regular expression that you might consider:

 >>> sentence = 'word1 word2 word3 word1 word2 word4' >>> remove_list = ['word1', 'word2'] >>> word_list = sentence.split() >>> ' '.join([i for i in word_list if i not in remove_list]) 'word3 word4'

+16

jurgenreza Mar 15 '13 at 15:19

source share

NPE · Accepted Answer · 2013-03-15T15:11:33+0000

One problem is that only inside \b is inside the line. The second is interpreted as a backspace character (ASCII 8), and not as a word boundary.

To fix, change

 regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)

to

 regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE) ^ THIS

Remove all occurrences of words in a string from python list

More articles: