Split row into rows of repeating elements

I want to split the line as:

'aaabbccccabbb' 

in

 ['aaa', 'bb', 'cccc', 'a', 'bbb'] 

What an elegant way to do this in Python? If this simplifies, we can assume that the string will contain only the characters a, b, and c.

+8
python
source share
4 answers

This is a use case for itertools.groupby :)

 >>> from itertools import groupby >>> s = 'aaabbccccabbb' >>> [''.join(y) for _,y in groupby(s)] ['aaa', 'bb', 'cccc', 'a', 'bbb'] 
+26
source share

You can create an iterator - without trying to be smart, just to make it short and unreadable:

 def yield_same(string): it_str = iter(string) result = it_str.next() for next_chr in it_str: if next_chr != result[0]: yield result result = "" result += next_chr yield result .. >>> list(yield_same("aaaaaabcbcdcdccccccdddddd")) ['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd'] >>> 

change ok, so there is itertools.groupby, which probably does something like this.

+3
source share

Here is the best way to find the regex:

 print [a for a,b in re.findall(r"((\w)\2*)", s)] 
+2
source share
 >>> import re >>> s = 'aaabbccccabbb' >>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)] ['aaa', 'bb', 'cccc', 'a', 'bbb'] 
+1
source share

All Articles