Odd behavior when cycling through a unicode string

When I do this:

text = u"奥巴马讲话"
for c in text:
    print c

I got the expected result:


But if I do this:

text = u"𤭢€"
for c in text:
    print c

I got:

 
 

I expect to receive:

𤭢

Why is this? I think this has something to do with the following fact:

In [1]: u"𤭢".encode("utf8")
Out[1]: '\xf0\xa4\xad\xa2'

"𤭢" is encoded using 4 bytes.

How can I scroll a unicode string that has this kind of encoding?

Something like u "𤭢 𤭢 𤭢 𤭢 𤭢 𤭢".

+4
source share
1 answer

𤭢 is outside the base multilingual plane; It has a code point U + 24B62. This means that for proper processing you need a Python build with sys.maxunicode == 1114111. See Unicode in Python for more details - UTF-16 only? .

, Python 3.3, . UTF-16 , : Unicode Python 3?

+3

All Articles