How to use a list of python objects whose representation is unicode

Question

How to use a list of python objects whose representation is unicode

I have an object that contains data in Unicode, and I want to use it in my view for example.

# -*- coding: utf-8 -*- class A(object): def __unicode__(self): return u"©au" def __repr__(self): return unicode(self).encode("utf-8") __str__ = __repr__ a = A() s1 = u"%s"%a # works #s2 = u"%s"%[a] # gives unicode decode error #s3 = u"%s"%unicode([a]) # gives unicode decode error

Now, even if I return unicode from repr , it still gives an error, so the question is, how can I use a list of such objects and create another Unicode string from it?

platform information:

 """ Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 'Linux-2.6.24-19-generic-i686-with-debian-lenny-sid' """

also not sure why

 print a # works print unicode(a) # works print [a] # works print unicode([a]) # doesn't works

python group answers http://groups.google.com/group/comp.lang.python/browse_thread/thread/bd7ced9e4017d8de/2e0b07c761604137?lnk=gst&q=unicode#2e0b07c761604137

+4

python unicode

Anurag uniyal May 09 '09 at 4:51

source share

7 answers

Try:

 s2 = u"%s"%[unicode(a)]

The main problem is that you are doing more conversions than you expect. Let's look at the following:

 s2 = u"%s"%[a] # gives unicode decode error

From Python Documentation ,

  's' String (converts any python object using str ()).
     If the object or format provided is a unicode string, 
     the resulting string will also be unicode.

When a string of format% s is processed, str ([a]) is applied. What you currently have is a string object containing a sequence of Unicode bytes. If you try to print this, there is no problem, because bytes go directly to your terminal and are displayed by the terminal.

 >>> x = "%s" % [a] >>> print x [©au]

The problem occurs when trying to convert this back to Unicode. In fact, the unicode function is called on a string that contains a sequence of Unicode encoded bytes, and this is what causes the ascii codec to crash.

  >>> u "% s"% x
     Traceback (most recent call last):
       File "", line 1, in 
     UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range (128)
     >>> unicode (x)
     Traceback (most recent call last):
       File "", line 1, in 
     UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range (128)

+3

saffsd May 09 '09 at 6:17

source share

First of all, ask yourself what you are trying to achieve. If all you need is a circular view of the list, you should simply do the following:

 class A(object): def __unicode__(self): return u"©au" def __repr__(self): return repr(unicode(self)) __str__ = __repr__ >>> A() u'\xa9au' >>> [A()] [u'\xa9au'] >>> u"%s" % [A()] u"[u'\\xa9au']" >>> "%s" % [A()] "[u'\\xa9au']" >>> print u"%s" % [A()] [u'\xa9au']

How it should work. The string representation of python lists is not something the user should see, so it makes sense to have escaped characters in them.

+2

itsadok May 11, '09 at 6:49

source share

If you want to use the list of unicode() objects to create a string in Unicode, try something like:

 u''.join([unicode(v) for v in [a,a]])

+1

Alan rowowarth May 09 '09 at 8:43

source share

Since this question involves a lot of confusing things in unicode, I thought I was offering an analysis of what is going on here.

It all comes down to the implementation of __unicode__ and __repr__ built-in list class. Basically, this is equivalent to:

 class list(object): def __repr__(self): return "[%s]" % ", ".join(repr(e) for e in self.elements) def __str__(self): return repr(self) def __unicode__(self): return str(self).decode()

Actually, list does not even define the __unicode__ and __str__ , which makes sense when you think about It.

When you write:

 u"%s" % [a] # it expands to u"%s" % unicode([a]) # which expands to u"%s" % repr([a]).decode() # which expands to u"%s" % ("[%s]" % repr(a)).decode() # (simplified a little bit) u"%s" % ("[%s]" % unicode(a).encode('utf-8')).decode()

This last line is a repr (a) extension using the __repr__ implementation in question.

So, as you can see, the object is first encoded in utf-8, only for subsequent decoding with the default system encoding, which usually does not support all characters.

As mentioned in some other answers, you can write your own function or even a subclass list, for example:

 class mylist(list): def __unicode__(self): return u"[%s]" % u", ".join(map(unicode, self))

Please note that this format is not round. This may even be misleading:

 >>> unicode(mylist([])) u'[]' >>> unicode(mylist([''])) u'[]'

In the course, you can write the quote_unicode function to make it round, but this is the moment to ask yourself what the point is . The unicode and str functions are intended to create a representation of an object that makes sense to the user. For programmers, there is a repr function. Raw lists are not what the user should see. Therefore, the list class does not implement the __unicode__ method.

To get a slightly better idea of what happens when playing with this small class:

 class B(object): def __unicode__(self): return u"unicode" def __repr__(self): return "repr" def __str__(self): return "str" >>> b repr >>> [b] [repr] >>> unicode(b) u'unicode' >>> unicode([b]) u'[repr]' >>> print b str >>> print [b] [repr] >>> print unicode(b) unicode >>> print unicode([b]) [repr]

+1

itsadok May 11, '09 at 7:28

source share

 # -*- coding: utf-8 -*- class A(object): def __unicode__(self): return u"©au" def __repr__(self): return unicode(self).encode('ascii', 'replace') __str__ = __repr__ a = A() >>> u"%s" % a u'\xa9au' >>> u"%s" % [a] u'[?au]'

0

Unknown May 09, '09 at 4:54

source share

repr and str should both return str objects, at least until Python 2.6.x. You get a decoding error because repr () is trying to convert your result to str, and it does not work.

I believe this has changed in Python 3.x.

0

Laurence gonsalves May 09 '09 at 5:05

source share

Nico · Accepted Answer · 2009-05-09T09:52:02+0000

s1 = u"%s"%a # works

This works because when working with 'a' it uses its Unicode representation (i.e. unicode method),

when you transfer it to a list, such as "[a]" ... when you try to put this list in a string, what is called is unicode ([a]) (which is the same as in the case of the list) , a string representation of a list that will use "repr (a)" to represent your element in its representation. This will cause a problem because you are passing an object 'str' (a string of bytes) that contains the encoded version of utf-8 'a', and when the format of the string tries to insert this into a unicode string, it will try to convert it back to a unicode object using encoding hte by default, i.e. ASCII. since ascii has no character that he is trying to change, he fails

what you want to do should be done as follows: u"%s" % repr([a]).decode('utf-8') , assuming all your elements are encoded in utf-8 (or ascii, which is a subset of utf-8 in terms of unicode).

for a better solution (if you still want to keep a string similar to the str list), you will need to use what was suggested earlier and use join in the following:

and '[%s]' % u','.join(unicode(x) for x in [a,a])

although this will not take care of a list containing a list of your objects A.

My explanations sound terribly obscure, but I hope you can make some sense out of it.

How to use a list of python objects whose representation is unicode

More articles: