Since this question involves a lot of confusing things in unicode, I thought I was offering an analysis of what is going on here.
It all comes down to the implementation of __unicode__ and __repr__ built-in list class. Basically, this is equivalent to:
class list(object): def __repr__(self): return "[%s]" % ", ".join(repr(e) for e in self.elements) def __str__(self): return repr(self) def __unicode__(self): return str(self).decode()
Actually, list does not even define the __unicode__ and __str__ , which makes sense when you think about It.
When you write:
u"%s" % [a]
This last line is a repr (a) extension using the __repr__ implementation in question.
So, as you can see, the object is first encoded in utf-8, only for subsequent decoding with the default system encoding, which usually does not support all characters.
As mentioned in some other answers, you can write your own function or even a subclass list, for example:
class mylist(list): def __unicode__(self): return u"[%s]" % u", ".join(map(unicode, self))
Please note that this format is not round. This may even be misleading:
>>> unicode(mylist([])) u'[]' >>> unicode(mylist([''])) u'[]'
In the course, you can write the quote_unicode function to make it round, but this is the moment to ask yourself what the point is . The unicode and str functions are intended to create a representation of an object that makes sense to the user. For programmers, there is a repr function. Raw lists are not what the user should see. Therefore, the list class does not implement the __unicode__ method.
To get a slightly better idea of what happens when playing with this small class:
class B(object): def __unicode__(self): return u"unicode" def __repr__(self): return "repr" def __str__(self): return "str" >>> b repr >>> [b] [repr] >>> unicode(b) u'unicode' >>> unicode([b]) u'[repr]' >>> print b str >>> print [b] [repr] >>> print unicode(b) unicode >>> print unicode([b]) [repr]