Python: why does str () on some text from a UTF-8 file give a UnicodeDecodeError?

Question

Python: why does str () on some text from a UTF-8 file give a UnicodeDecodeError?

I am processing a UTF-8 file in Python and used simplejson to load it into a dictionary. However, I get a UnicodeDecodeError when I try to turn one of the dictionary values into a string:

f = open('my_json.json', 'r')
master_dictionary = json.load(f)
#some json wrangling, then it fails on this line...
mysql_string += " ('" + str(v_dict['code'])
Traceback (most recent call last):
  File "my_file.py", line 25, in <module>
    str(v_dict['code']) + "'), "
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 35: ordinal not in range(128)

Why does Python even use ASCII? I thought he used UTF-8 by default, and the input was from a UTF-8 file.

$ file my_json.json 
my_json.json: UTF-8 Unicode English text

What is the problem?

+5

python character-encoding

AP257 Mar 31 '10 at 16:10

source share

2 answers

- UTF-8 , :

import sys
sys.setdefaultencoding("utf-8")

, , unicode.

unicode, str:

mysql_string += " ('" + unicode(v_dict['code'])

:

mysql_string += " ('" + unicode(v_dict['code'], "utf-8")

+2

danben 31 . '10 16:21

Ignacio Vazquez-Abrams · Accepted Answer · 2010-03-31T16:22:05+0000

Python 2.x uses ASCII by default. Use unicode.encode()if you want to turn unicodeinto str:

v_dict['code'].encode('utf-8')

Python: why does str () on some text from a UTF-8 file give a UnicodeDecodeError?

More articles: