UnicodeEncodeError: ascii codec cannot encode characters at position 0-5: serial number not in range (128)

Question

UnicodeEncodeError: ascii codec cannot encode characters at position 0-5: serial number not in range (128)

I'm just trying to decode the string \ uXXXX \ uXXXX \ uXXXX. But I get an error message:

$ python Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

I am new to Python. What is the problem? Thanks!

+8

python python-2.7 utf-8 decode

Serhii Matrunchyk Feb 16 '15 at 15:23

source share

2 answers

you can set the default encoding utf-8.

 import sys reload(sys) sys.setdefaultencoding('utf-8')

+10

Ranvijay sachan Oct 9 '15 at 5:22

source share

Martijn pieters · Accepted Answer · 2015-02-16T15:25:07+0000

Python is trying to help. You cannot decode Unicode data; it is already decoded. That way, Python will first encode the data (using the ASCII codec) to get bytes for decoding. This implicit encoding does not work.

If you have data in Unicode, it makes sense to encode in UTF-8 rather than decode:

 >>> print u'\u041e\u043b\u044c\u0433\u0430'  >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8') '\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

If you want to get the Unicode value, then using the Unicode literal ( u'...' ) is all you need. No further decoding is required.

The same implicit conversion happens in a different direction; if you try to encode a byte string, you will call implicit decoding:

 >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

UnicodeEncodeError: ascii codec cannot encode characters at position 0-5: serial number not in range (128)

More articles: