How to split a unicode string into a list

I have the following code:

stru = "۰۱۲۳۴۵۶۷۸۹" strlist = stru.decode("utf-8").split() print strlist[0] 

my conclusion:

 ۰۱۲۳۴۵۶۷۸۹ 

But when I use:

 print strlist[1] 

I get the following traceback :

 IndexError: list index out of range 

My question is : how can I split my string ? Of course, remember that I got a string from function , consider its variable ?

+8
python string unicode utf-8 unicode-string
source share
3 answers

The split() method defaults to spaces. Therefore strlist is a list containing the entire line in strlist[0] and one single element.

If you need a list with one element for each Unicode code point, you can convert it to a list in different ways:

  • Function: list(stru.decode("utf-8"))
  • List Contents: [item for item in stru.decode("utf-8")]
  • Do not convert at all. Do you really need a list? You can iterate over a Unicode string just like any other type of sequence ( for character in stru.decode("utf-8"): ...)
+9
source share
  • You do not need.

     >>> print u"۰۱۲۳۴۵۶۷۸۹"[1] ۱ 
  • If you still want to ...

     >>> list(u"۰۱۲۳۴۵۶۷۸۹") [u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9'] 
+14
source share

You can do it

 list(stru.decode("utf-8")) 
+6
source share

All Articles