UTF-8 error with Python and gettext

I use UTF-8 in my editor, so all the lines displayed here are UTF-8 in the file.

I have a python script like this:

# -*- coding: utf-8 -*- ... parser = optparse.OptionParser( description=_('automates the dice rolling in the classic game "risk"'), usage=_("usage: %prog attacking defending")) 

Then I used xgettext to get everything and get a .pot file that you can collapse to:

 "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" #: auto_dice.py:16 msgid "automates the dice rolling in the classic game \"risk\"" msgstr "" 

After that, I used msginit to get de.po , which I populated as follows:

 "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" #: auto_dice.py:16 msgid "automates the dice rolling in the classic game \"risk\"" msgstr "automatisiert das WΓΌrfeln bei \"Risiko\"" 

Running the script, I get the following error:

  File "/usr/lib/python2.6/optparse.py", line 1664, in print_help file.write(self.format_help().encode(encoding, "replace")) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 60: ordinal not in range(128) 

How can i fix this?

+6
python localization gettext
source share
3 answers

This error means that you called encode in byte mode, so it tries to decode it in Unicode using the default system encoding (ascii in Python 2) and then transcode it using what you specified.

Typically, the way to resolve this is to call s.decode('utf-8') (or whatever encoding the lines are in) before trying to use the lines. It can also work if you just use literals in the unicode: u'automates...' format (this depends on how the strings are replaced with .po files, which I don't know about).

This behavior gets confused in Python 3, which will not try to convert bytes to unicode unless you specifically report it.

+6
source share

I suspect the problem is because _("string") returns a byte string, not a Unicode string.

The obvious workaround is the following:

 parser = optparse.OptionParser( description=_('automates the dice rolling in the classic game "risk"').decode('utf-8'), usage=_("usage: %prog attacking defending").decode('utf-8')) 

But it's not right.

ugettext or install (True) can help.

Python gettext docs give the following examples:

 import gettext t = gettext.translation('spam', '/usr/share/locale') _ = t.ugettext 

or

 import gettext gettext.install('myapplication', '/usr/share/locale', unicode=1) 

I am trying to reproduce your problem, and even if I use install(unicode=1) , I return a byte string ( str type).

Either I'm using gettext incorrectly, or I'm missing a character encoding declaration in my .po / .mo file.

I will update when I find out more.

 xlt = _('automates the dice rolling in the classic game "risk"') print type(xlt) if isinstance(xlt, str): print 'gettext returned a str (wrong)' print xlt print xlt.decode('utf-8').encode('utf-8') elif isinstance(xlt, unicode): print 'gettext returned a unicode (right)' print xlt.encode('utf-8') 

(Another possibility is to use escape codes or Unicode code codes in a .po file, but that doesn't sound like fun.)

(Or you can look at your .po system files to see how they handle non-ASCII characters.)

+4
source share

I am not familiar with this, but it seems to be a known bug in version 2.6, fixed in version 2.7:

http://bugs.python.org/issue2931

If you cannot use 2.7, try this workaround:

http://mail.python.org/pipermail/python-dev/2006-May/065458.html

+2
source share

All Articles