Python is trying to help. You cannot decode Unicode data; it is already decoded. That way, Python will first encode the data (using the ASCII codec) to get bytes for decoding. This implicit encoding does not work.
If you have data in Unicode, it makes sense to encode in UTF-8 rather than decode:
>>> print u'\u041e\u043b\u044c\u0433\u0430' >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8') '\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'
If you want to get the Unicode value, then using the Unicode literal ( u'...' ) is all you need. No further decoding is required.
The same implicit conversion happens in a different direction; if you try to encode a byte string, you will call implicit decoding:
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)
Martijn pieters
source share