Using Python 2.7.3, the following works fine for me:
import re pattern = re.compile(u"<.*?>| |&|\u260e",re.DOTALL|re.M) s = u"bla ble \u260e blo" re.sub(pattern, "", s)
Output:
u'bla ble blo'
As pointed out by @Zack, this works because the string is now in unicode, i.e. the string has already been converted, and the sequence of characters \u260e now perhaps two bytes to write this small black telephone ☎ (
As soon as the string to be searched and the regular expression has the black phone itself, and not the \u260e character \u260e , they both match.
Rubens
source share