UnicodeDecodeError: codec 'ascii' cannot decode

Question

UnicodeDecodeError: codec 'ascii' cannot decode

I am reading a file containing Romanian words in Python with file.readline (). I have a problem with many characters due to encoding.

Example:

>>> a = "aberație" #type 'str' >>> a -> 'abera\xc8\x9bie' >>> print sys.stdin.encoding UTF-8

I tried encode () with utf-8, cp500, etc., but it does not work.

I can not find which character encoding should I use?

early.

Edit: the goal is to save the word from the file in dictionnary, and when printing it, to get aberaţie, not 'abera \ xc8 \ x9bie'

+8

python file encoding decoding

lilawood Jun 30 '11 at 21:21

source share

1 answer

Claudiu · Accepted Answer · 2011-06-30T21:26:13+0000

What are you trying to do?

This is a set of bytes:

 BYTES = 'abera\xc8\x9bie'

This is a set of bytes that represents the utf-8 encoding of the string "aberaţie". You decode bytes to get a unicode string:

 >>> BYTES 'abera\xc8\x9bie' >>> print BYTES aberaÈ›ie >>> abberation = BYTES.decode('utf-8') >>> abberation u'abera\u021bie' >>> print abberation aberație

If you want to save the Unicode string to a file, you need to encode it in a specific byte format of your choice:

 >>> abberation.encode('utf-8') 'abera\xc8\x9bie' >>> abberation.encode('utf-16') '\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

UnicodeDecodeError: codec 'ascii' cannot decode

More articles: