Why is unicode () using str () for my object only without encoding?

I start by creating a string variable with some non-ascii data encoded in utf-8:

>>> text = 'á'
>>> text
'\xc3\xa1'
>>> text.decode('utf-8')
u'\xe1'

Using unicode()it causes errors ...

>>> unicode(text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
                    ordinal not in range(128)

... but if I know the encoding, I can use it as a second parameter:

>>> unicode(text, 'utf-8')
u'\xe1'
>>> unicode(text, 'utf-8') == text.decode('utf-8')
True

Now, if I have a class that returns this text in a method __str__():

>>> class ReturnsEncoded(object):
...     def __str__(self):
...         return text
... 
>>> r = ReturnsEncoded()
>>> str(r)
'\xc3\xa1'

unicode(r)seems to be using on it str(), as it is causing the same error as unicode(text)above:

>>> unicode(r)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
                    ordinal not in range(128)

So far, everything is as planned!

But, as no one had ever expected, unicode(r, 'utf-8')would not even try:

>>> unicode(r, 'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: coercing to Unicode: need string or buffer, ReturnsEncoded found

Why? Why is this inconsistent behavior? This is mistake? is it intended Very uncomfortable.

+5
2

, . Python ( 2.5.2, ):

unicode ([object [, encoding [, errors]]])

Unicode , :

/ , unicode()    , 8- ,     .     ; ,    LookupError .    ; ,     . "" (     ), ValueError ,     "ignore" ,     "replace" Unicode,    U + FFFD, ,    . . codecs.

, unicode()     str(), , Unicode     8- . , Unicode     , Unicode     .

, __unicode __(),     Unicode.     , 8-    , Unicode     "".

2.0. 2.2: ​​ __unicode __().

, unicode(r, 'utf-8'), 8- , __str__() utf-8 . utf-8 unicode() a __unicode__() , __str__(), , unicode.

+7

unicode . unicode, __unicode__(), Unicode.


, unicode(r) __str__(). __unicode__(). __unicode__() __str__(), ascii. , unicode() , -, , basestring.


, ascii, "utf-8" . "utf-8" , ...

, , "utf-8" , , , . , .

. , text UTF-8, __unicode__(), .

+4

All Articles