Best output type and coding methods for repr () functions?

Question

Best output type and coding methods for repr () functions?

Recently, I had problems with __repr__() , format() and encodings. Should __repr__() output be encoded or be a Unicode string? Is there a better encoding for __repr__() result in Python? What I want to output has non-ASCII characters.

I use Python 2.x and I want to write code that can be easily adapted to Python 3. Thus, the program uses

 # -*- coding: utf-8 -*- from __future__ import unicode_literals, print_function # The 'Hello' literal represents a Unicode object

Here are some additional issues that bother me, and I'm looking for a solution that solves them:

Printing on the UTF-8 terminal should work (I have sys.stdout.encoding installed on UTF-8 , but it would be better if other things worked too).
Work with the output to the file (encoded in UTF-8) should work (in this case sys.stdout.encoding is None ).
My code for many __repr__() functions currently has a lot of return ….encode('utf-8') , and this is hard. Is there anything reliable and lightweight?
In some cases, I even have ugly beasts such as return ('<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8') , i.e. the representation of the objects is decoded, placed on the format string, and then re-encoded. I would like to avoid such confusing transformations.

What would you advise to write simple __repr__() functions that are well related to these coding issues?

+64

python encoding ascii repr

EOL 02 Sep '10 at 13:57

source share

3 answers

I think the decorator can intelligently manage the __repr__ incompatibility. Here is what I use:

 from __future__ import unicode_literals, print_function import sys def force_encoded_string_output(func): if sys.version_info.major < 3: def _func(*args, **kwargs): return func(*args, **kwargs).encode(sys.stdout.encoding or 'utf-8') return _func else: return func class MyDummyClass(object): @force_encoded_string_output def __repr__(self): return 'My Dummy Class! \N{WHITE SMILING FACE}'

+6

Titon Dec 12 '12 at 21:10

source share

I am using the following function:

 def stdout_encode(u, default='UTF8'): if sys.stdout.encoding: return u.encode(sys.stdout.encoding) return u.encode(default)

Then my __repr__ functions look like this:

 def __repr__(self): return stdout_encode(u'<MyClass {0} {1}>'.format(self.abcd, self.efgh))

+1

Buttons840 May 17 '12 at 15:59

source share

unutbu · Accepted Answer · 2010-09-02 14:01

In Python2, __repr__ (and __str__ ) should return a string object, not a unicode. In Python3, the situation reverses, __repr__ and __str__ should return unicode objects, not née string objects:

 class Foo(object): def __repr__(self): return u'\N{WHITE SMILING FACE}' class Bar(object): def __repr__(self): return u'\N{WHITE SMILING FACE}'.encode('utf8') repr(Bar()) # ☺ repr(Foo()) # UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

In Python2, you really have no choice. You must select an encoding for the return value of __repr__ .

By the way, did you read the PrintFails wiki ? It may not directly answer your other questions, but I found it useful in highlighting why some errors occur.

When using from __future__ import unicode_literals ,

 '<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')

can simply be written as

 str('<{}>').format(repr(x))

Assuming str is encoded in utf-8 on your system.

Without from __future__ import unicode_literals expression can be written as:

 '<{}>'.format(repr(x))

Best output type and coding methods for __repr __ () functions?

More articles:

Best output type and coding methods for repr () functions?