I try to match and remove all words from a list from a string using a compiled regular expression, but I try to avoid appearing in words.
Current:
REMOVE_LIST = ["a", "an", "as", "at", ...] remove = '|'.join(REMOVE_LIST) regex = re.compile(r'('+remove+')', flags=re.IGNORECASE) out = regex.sub("", text)
Q: "A quick brown fox jumped over ant"
Out: "fast brown fox jumped over t"
Expected: "brown fox spread quickly"
I tried to change the line for compilation with the following, but to no avail:
regex = re.compile(r'\b('+remove+')\b', flags=re.IGNORECASE)
Any suggestions or am I missing something brightly obvious?
source share