UnicodeEncodeError: ascii codec cannot encode characters at position 0-5: serial number not in range (128)

I'm just trying to decode the string \ uXXXX \ uXXXX \ uXXXX. But I get an error message:

$ python Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) 

I am new to Python. What is the problem? Thanks!

+8
source share
2 answers

Python is trying to help. You cannot decode Unicode data; it is already decoded. That way, Python will first encode the data (using the ASCII codec) to get bytes for decoding. This implicit encoding does not work.

If you have data in Unicode, it makes sense to encode in UTF-8 rather than decode:

 >>> print u'\u041e\u043b\u044c\u0433\u0430'  >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8') '\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0' 

If you want to get the Unicode value, then using the Unicode literal ( u'...' ) is all you need. No further decoding is required.

The same implicit conversion happens in a different direction; if you try to encode a byte string, you will call implicit decoding:

 >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) 
+15
source

you can set the default encoding utf-8.

 import sys reload(sys) sys.setdefaultencoding('utf-8') 
+10
source

All Articles