Python encoding conversion

Here is my problem, I have a variable that is incorrectly encoded, which I want to fix. In short, I finish:

myVar=u'\xc3\xa9'

which is incorrect because the character 'é' or \u00e9UTF-8 is encoded , not unicode.

None of the encoding / decoding combinations that I tried seem to solve the problem. I looked at the bytearray object, but you have to provide an encoding, and obviously none of them work.

Basically I need to rethink the byte array into the correct encoding. Any ideas on how to do this? Thank.

+5
source share
2 answers

What you should have done.

>>> b='\xc3\xa9'
>>> b
'\xc3\xa9'
>>> b.decode("UTF-8")
u'\xe9'

, , , , .

, .

>>> c
u'\xc3\xa9'
>>> c.decode("UTF-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

.

>>> [ chr(ord(x)) for x in c ]
['\xc3', '\xa9']
>>> ''.join(_)
'\xc3\xa9'
>>> _.decode("UTF-8")
u'\xe9'

, .

+5

: ord, (- ) chr, .

>>> u = u'\xc3\xa9'
>>> s = ''.join(chr(ord(c)) for c in u)
>>> unicode(s, encoding='utf-8')
u'\xe9'
+1

All Articles