How to split a unicode string into a list

Question

I have the following code:

stru = "۰۱۲۳۴۵۶۷۸۹" strlist = stru.decode("utf-8").split() print strlist[0]

my conclusion:

 ۰۱۲۳۴۵۶۷۸۹

But when I use:

 print strlist[1]

I get the following traceback :

 IndexError: list index out of range

My question is : how can I split my string ? Of course, remember that I got a string from function , consider its variable ?

+8

python string unicode utf-8 unicode-string

Persiangulf 10 Sep '13 at 5:37

source share

3 answers

You do not need.

 >>> print u"۰۱۲۳۴۵۶۷۸۹"[1] ۱

If you still want to ...

 >>> list(u"۰۱۲۳۴۵۶۷۸۹") [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']

+14

Ignacio Vazquez-Abrams 10 Sep '13 at 5:42

source share

You can do it

 list(stru.decode("utf-8"))

+6

Roman pekar 10 Sep '13 at 5:42

source share

chryss · Accepted Answer · 2013-09-10T05:45:01+0000

The split() method defaults to spaces. Therefore strlist is a list containing the entire line in strlist[0] and one single element.

If you need a list with one element for each Unicode code point, you can convert it to a list in different ways:

Function: list(stru.decode("utf-8"))
List Contents: [item for item in stru.decode("utf-8")]
Do not convert at all. Do you really need a list? You can iterate over a Unicode string just like any other type of sequence ( for character in stru.decode("utf-8"): ...)