I am trying to create a list of all overlapping substrings of n-length in a given string.
For example, for n from 6 and the string "hereismystring" I would generate a list ["hereis", "ereism", "reismy", ..., "string"] . The trivial code that I am using now is as follows:
n = 6 l = len(string) substrings = [string[i:(i + n)] for i in xrange(l - n + 1)]
Simple enough. The problem is that I would like to speed this up (I have a lot of very long lines). Is there a faster way in Python? Would it even go down to Cython, given that Python string routines are anyway?
For reference, this method accepts about 100US on my machine (new Macbook Pro) for a string length of 500 and n out of 30.
Thanks for the help in advance!
source share