Why is unicode () using str () for my object only without encoding?

Question

Why is unicode () using str () for my object only without encoding?

I start by creating a string variable with some non-ascii data encoded in utf-8:

>>> text = 'á'
>>> text
'\xc3\xa1'
>>> text.decode('utf-8')
u'\xe1'

Using unicode()it causes errors ...

>>> unicode(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
                    ordinal not in range(128)

... but if I know the encoding, I can use it as a second parameter:

>>> unicode(text, 'utf-8')
u'\xe1'
>>> unicode(text, 'utf-8') == text.decode('utf-8')
True

Now, if I have a class that returns this text in a method __str__():

>>> class ReturnsEncoded(object):
...     def __str__(self):
...         return text
... 
>>> r = ReturnsEncoded()
>>> str(r)
'\xc3\xa1'

unicode(r)seems to be using on it str(), as it is causing the same error as unicode(text)above:

>>> unicode(r)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
                    ordinal not in range(128)

So far, everything is as planned!

But, as no one had ever expected, unicode(r, 'utf-8')would not even try:

>>> unicode(r, 'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: coercing to Unicode: need string or buffer, ReturnsEncoded found

Why? Why is this inconsistent behavior? This is mistake? is it intended Very uncomfortable.

+5

python encoding unicode

nosklo 20 . '08 0:53

2

unicode . unicode, __unicode__(), Unicode.

, unicode(r) __str__(). __unicode__(). __unicode__() __str__(), ascii. , unicode() , -, , basestring.

, ascii, "utf-8" . "utf-8" , ...

, , "utf-8" , , , . , .

. , text UTF-8, __unicode__(), .

+4

John Millikin 20 . '08 0:58

Blair Conrad · Accepted Answer · 2008-09-20T01:32:09+0000

, . Python ( 2.5.2, ):

unicode ([object [, encoding [, errors]]])
Unicode , :
/ , unicode()    , 8- ,     .     ; ,    LookupError .    ; ,     . "" (     ), ValueError ,     "ignore" ,     "replace" Unicode,    U + FFFD, ,    . . codecs.
, unicode()     str(), , Unicode     8- . , Unicode     , Unicode     .
, __unicode __(),     Unicode.     , 8-    , Unicode     "".
2.0. 2.2: __unicode __().

, unicode(r, 'utf-8'), 8- , __str__() utf-8 . utf-8 unicode() a __unicode__() , __str__(), , unicode.

Why is unicode () using str () for my object only without encoding?

More articles: