How can I successfully use UNICODE characters in my .py files without any problems?

Question

How can I successfully use UNICODE characters in my .py files without any problems?

I am writing a test for a database that has Swedish characters. In the test, I directly use characters with umlauts and other similar Swedish ligatures, and it works great by reading the file names from the database and successfully executing the string.

However, when importing this file to generate pydoc, I get a too familiar exception:

SyntaxError: non-ASCII character '\ xc3' in file foo.py on line 1, but no encoding declared; See http://www.python.org/peps/pep-0263.html for more details.

Having done some research on my own, I found that adding

# -*- coding: iso-8859-15 -*-

At the top of my file, the import problem was fixed. However, the test does not currently perform all string comparisons. I tried an alternative method of refusing to declare encoding and writing strings as

 u"Bokmärken"

... but this still does not allow verification.

Does anyone know a good way to fix this?

+4

python unicode

Staunch Jul 12 '11 at 17:13

source share

1 answer

shelhamer · Accepted Answer · 2011-07-12T17:22:13+0000

You need to set the encoding in your editor and database so that they match. If your database is encoded in utf-8 rather than iso-8859-15, then installing your editor on utf-8 should fix it. However, since your u'string comparisons do not work, this may not be the case.

Replace

 # -*- coding: iso-8859-15 -*-

with

 # -*- coding: utf-8 -*-

or (equivalent)

 # coding=utf-8

Try utf-8 encoding.

A printout of debug output using repr('swedish string' and repr(u'swedish string') will also be useful in checking for differences. Immediately after the line of the translator. Can you tell us what encoding your database is installed in? Also , were there any database data written in python or inserted directly? You could write the incorrectly encoded data to the database, which now causes problems when comparing.

How can I successfully use UNICODE characters in my .py files without any problems?

More articles: