How to find the number of overlapping sequences in a String with Python?

I have a long sequence, and I would like to know how often some subsequences occur in this sequence.

I know string.count (s, sub) , but it takes into account non-overlapping sequences.

Is there a similar function that also takes into account overlapping sequences?

+5
source share
4 answers

As an alternative to writing your own search function, you can use the re module:

 In [22]: import re In [23]: haystack = 'abababa baba alibababa' In [24]: needle = 'baba' In [25]: matches = re.finditer(r'(?=(%s))' % re.escape(needle), haystack) In [26]: print [m.start(1) for m in matches] [1, 3, 8, 16, 18] 

The above deduces the initial positions of all (potentially overlapping) matches.

If all you need is an account, the following should do the trick:

 In [27]: len(re.findall(r'(?=(%s))' % re.escape(needle), haystack)) Out[27]: 5 
+10
source

It’s easier to understand how to do this:

 def count(sub, string): count = 0 for i in xrange(len(string)): if string[i:].startswith(sub): count += 1 return count count('baba', 'abababa baba alibababa') #output: 5 

If you like short snippets, you can make it less readable, but smarter:

 def count(subs, s): return sum((s[i:].startswith(subs) for i in xrange(len(s)))) 

This exploits the fact that Python can handle logical integers.

+6
source

This should help you:

 matches =[] st = 'abababa baba alibababa' needle = 'baba' for i in xrange(len(st)-len(needle)+1): i = st.find(needle,i,i+len(needle)) if(i >= 0): matches.append(st.find(needle,i,i+len(needle))) print(str(matches)) 

see here: http://codepad.org/pmkKXmWB

Have not tested it for long lines, see if it is enough for your use.

+1
source

Today I found out that you can use an index to trigger the next occurrence of your substring:

 string = 'bobobobobobobob' # long string or variable here count = 0 start = 0 while True: index = string.find('bob', start) if index >= 0: count += 1 start += 1 else: break print(count) 

Returns 7

0
source

All Articles