As a pythonic way, you can use the zip function in understanding the list:
>>> s = 'abbbcppq' >>> >>> [i+j for i,j in zip(s,s[1:]) if i==j] ['bb', 'bb', 'pp']
If you are dealing with a large string, you can use the iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterators, and then calling the next function on the second iterator consumes the first element and use the class call zip (in Python 2.X use itertools.izip() , which returns an iterator) with these iterators.
>>> from itertools import tee >>> first = iter(s) >>> second, first = tee(first) >>> next(second) 'a' >>> [i+j for i,j in zip(first,second) if i==j] ['bb', 'bb', 'pp']
Test with RegEx recipe:
# ZIP ~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]" 1000000 loops, best of 3: 1.56 usec per loop # REGEX ~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([az])\2)', 'abbbbcppq')]" 100000 loops, best of 3: 3.21 usec per loop
After the last edit indicated in the comment, if you want to match only one pair b in lines like "abbbcppq" , you can use finditer() , which returns an iterator of matching objects and retrieves the result using the group() method:
>>> import re >>> >>> s = "abbbcppq" >>> [item.group(0) for item in re.finditer(r'([az])\1',s,re.I)] ['bb', 'pp']
Note that re.I is the IGNORECASE flag, which also makes RegEx capitalized accordingly.
Kasramvd Dec 14 '15 at 7:03 2015-12-14 07:03
source share