Does Python find patterns in a large group of numbers?

I work with a list of lists that have periods of continued fractions for imperfect square roots in each of them.

What I'm trying to do with them is to check the size of the largest repeating pattern in each list.

Some of the lists, for example:

[ [1,1,1,1,1,1....], [4,1,4,1,4,1....], [1,2,10,1,2,10....], [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8....], [2,2,2,4,2,2,2,4....], [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15....], [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25....] ] 

Two similar methods that I worked with are as follows:

 def lengths(seq): for i in range(len(seq),1,-1): if seq[0:i] == seq[i:i*2]: return i def lengths(seq): for i in range(1,len(seq)-1): if seq[0:i] == seq[i:i*2]: return i 

They take the size of the lists and compare the indexed sizes of it with the current position. The problem is that at first it returns incorrectly for only one repeating digit, because it starts large and sees only one large pattern. The problem with the second is that there are nested patterns such as the sixth and seventh examples, and this will execute with a nested loop and ignore the rest of the pattern.

+4
source share
5 answers

It works (a typo is caught in the 4th element of your sample)

 >>> seq_l = [ ... [1,1,1,1,1,1], ... [4,1,4,1,4,1], ... [1,2,10,1,2,10], ... [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8], ... [2,2,2,4,2,2,2,4,2,2,2,4,2,2,2,4], ... [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15], ... [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25] ... ] >>> >>> def rep_len(seq): ... s_len = len(seq) ... for i in range(1,s_len-1): ... if s_len%i == 0: ... j = s_len/i ... if seq == j*seq[:i]: ... return i ... ... >>> [rep_len(seq) for seq in seq_l] [1, 2, 3, 12, 4, 18, 12] 
+4
source

If it is not possible to convert your lists to strings, using regular expressions will make this task trivial.

 import re lists = [ [1,1,1,1,1,1], [4,1,4,1,4,1], [1,2,10,1,2,10], [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8], #I think you had a typo in this one... [2,2,2,4,2,2,2,4], [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15], [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25] ] for l in lists: s = "x".join(str(i) for i in l) print s match = re.match(r"^(?P<foo>.*)x?(?P=foo)", s) if match: print match.group('foo') else: print "****" print 

(?P<foo>.*) Creates a group known as "foo" and (?P=foo) matches this. Since regular expressions are greedy, by default you get the longest match. "X?" just allows one x in the middle to process even / odd lengths.

+2
source

You could probably do collection.defaultdict (int) to count all the subscriptions if you don't know that there are some sublists you don't need. Convert sublists to tuples before you make them dictionary keys.

You might be able to use a series of flowering filters somewhere, though, if the space is tight. You will have one flowering filter for subsequences of length 1, another for subsequences of length 2, etc. Then the largest flowering filter that receives a collision has the largest substring of length.

http://stromberg.dnsalias.org/~strombrg/drs-bloom-filter/

+1
source

I think you just need to check two levels of sequences at once. 0..i == i..i*2 and 0..i/2 != i/2..i .

 def lengths(seq): for i in range(len(seq),1,-1): if seq[0:i] == seq[i:i*2] and seq[0:i/2] != seq[i/2:i]: return i 

If the two halves of 0..i are equal, then this means that you are actually comparing the two concatenated patterns with each other.

0
source

Starting with the first example method, you can recursively search for a subset.

 def lengths(seq): for i in range(len(seq)-1,1,-1): if seq[0:i] == seq[i:i*2]: j = lengths(seq[0:i]) # Search pattern for sub pattern if j < i and i % j == 0: # Found a smaller pattern; further, a longer repeated # pattern length must be a multiple of the shorter pattern length n = i/j # Number of pattern repetitions (might change to // if using Py3K) for k in range(1, n): # Check that all the smaller patterns are the same if seq[0:j] != seq[j*n:j*(n+1)]: # Stop when we find a mismatch return i # Not a repetition of smaller pattern else: return j # All the sub-patterns are the same, return the smaller length else: return i # No smaller pattern 

I feel that this solution is not entirely correct, but I will do some tests and edit them as necessary. (Quick note: should you start a loop to start at len ​​(seq) -1? If not, do you compare seq [0: len] with seq [len: len], which seems silly, and will invoke the recursion loop endlessly.)

Edit: It seems like something seems like the best answer on the senderle related question, so you better just read this.;)

0
source

All Articles