Remove all hexadecimal characters from a string in Python

Although there are similar questions, I cannot find a working solution for my case:

I come across some annoying hexadecimal characters in strings, for example.

'\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah'

I need to remove these hexadecimal characters \xHHand only them to get the following result:

'http://www.google.com blah blah#%#@$^blah'

decoding does not help:

s.decode('utf8') # u'\u201chttp://www.google.com\u201d blah blah#%#@$^blah'

How can i achieve this?

+4
source share
3 answers

Just remove all non-ASCII characters:

>>> s.decode('utf8').encode('ascii', errors='ignore')
'http://www.google.com blah blah#%#@$^blah'

Another possible solution:

>>> import string
>>> s = '\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah'
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'http://www.google.com blah blah#%#@$^blah'

Or use regular expressions:

>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', s) 
'http://www.google.com blah blah#%#@$^blah'

Choose your favorite.

+11
source

" ", (utf-8, , ) " " ( ") " " DOUBLE QUOTATION MARK '(' '').

>>> s = "\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah"
>>> print s
"http://www.google.com" blah blah#%#@$^blah
>>> s.decode("utf-8")
u'\u201chttp://www.google.com\u201d blah blah#%#@$^blah'
>>> print s.decode("utf-8")
"http://www.google.com" blah blah#%#@$^blah

, , str.replace() :

>>> s.replace("\xe2\x80\x9c", "").replace("\xe2\x80\x9d", "")
'http://www.google.com blah blah#%#@$^blah'

, ascii, unicode, ascii "ignore":

>>> s.decode("utf-8").encode("ascii", "ignore")
'http://www.google.com blah blah#%#@$^blah'
+5

, , , string. , , string.ascii_letters ( string.ascii_lowercase, string.ascii_uppercase), string.digits, string.printable string.punctuation.

string.printable, , .

, :

import string
valid_characters = string.printable
start_string = '\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah'
end_string = ''.join(i for i in start_string if i in valid_characters)
+1

All Articles