Search for clusters of numbers in a list

I am struggling with this since I am sure that a dozen for-loops are not a solution to this problem:

There is a sorted list of numbers like

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] 

and I want to create a dict with lists of numbers, with the difference in numbers (following each other) not more than 15. Thus, the output will be as follows:

 clusters = { 1 : [123, 124, 128], 2 : [160, 167], 3 : [213, 215, 230, 245, 255, 257], 4 : [400, 401, 402], 5 : [430] } 

My current solution is a bit ugly (I need to remove duplicates at the end ...), I'm sure it can be done with pythonic.

Here is what I am doing now:

 clusters = {} dIndex = 0 for i in range(len(numbers)-1) : if numbers[i+1] - numbers[i] <= 15 : if not clusters.has_key(dIndex) : clusters[dIndex] = [] clusters[dIndex].append(numbers[i]) clusters[dIndex].append(numbers[i+1]) else : dIndex += 1 
+7
source share
4 answers

It’s not necessary if your list is small, but I probably approach it in the “streaming processing” mode: define a generator that will make your input iterable and give out elements grouped into runs of numbers that differ by <= 15. Then you can use this to easily generate a dictionary.

 def grouper(iterable): prev = None group = [] for item in iterable: if not prev or item - prev <= 15: group.append(item) else: yield group group = [item] prev = item if group: yield group numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] dict(enumerate(grouper(numbers), 1)) 

prints:

 {1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]} 

As a bonus, this even allows you to group your runs for potentially endless lists (if they are sorted, of course). You can also insert part of the index generation into the generator itself (instead of using enumerate ) as a minor improvement.

+9
source
 import itertools import numpy as np numbers = np.array([123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]) nd = [0] + list(np.where(np.diff(numbers) > 15)[0] + 1) + [len(numbers)] a, b = itertools.tee(nd) next(b, None) res = {} for j, (f, b) in enumerate(itertools.izip(a, b)): res[j] = numbers[f:b] 

If you can use itertools and numpy. Adapted pairwise for iterator tricks. An index shift requires +1 by adding 0 and len(numbers) to the list to make sure the first and last entries are included correctly.

You can do it without itertools , but I like tee .

+3
source

Here's a relatively simple solution that works for lists or generators. It lazily gives pairs (group_number, element) , so you have to do the actual grouping separately if you need it. (Or maybe you just need a group number.)

  from itertools import tee def group(xs, gap=15): # use `tee` to get two efficient iterators xs1, xs2 = tee(xs) # the first element is in group 0, also advance the second iterator group = 0 yield (group, next(xs2)) # after advancing xs2, this zip is pairs of consecutive elements for x, y in zip(xs1, xs2): # whenever the gap is too large, increment the group number if y - x > gap: group += 1 # and yield the second number in the pair yield group, y 
+1
source

Using a generator to separate logic: (one function does one thing)

 numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430] def cut_indices(numbers): # this function iterate over the indices that need to be 'cut' for i in xrange(len(numbers)-1): if numbers[i+1] - numbers[i] > 15: yield i+1 def splitter(numbers): # this function split the original list into sublists. px = 0 for x in cut_indices(numbers): yield numbers[px:x] px = x yield numbers[px:] def cluster(numbers): # using the above result, to form a dict object. cluster_ids = xrange(1,len(numbers)) return dict(zip(cluster_ids, splitter(numbers))) print cluster(numbers) 

The above codes give me

 {1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]} 
0
source

All Articles