Summing Pythonically Sequential Ranges

Question

Summing Pythonically Sequential Ranges

I have a sumranges () function that sums all ranges of consecutive numbers found in a tuple of tuples. To illustrate:

def sumranges(nums): return sum([sum([1 for j in range(len(nums[i])) if nums[i][j] == 0 or nums[i][j - 1] + 1 != nums[i][j]]) for i in range(len(nums))]) >>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400)) >>> print sumranges(nums) 7

As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len ((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), ( 29), (400)) = 7. Tuples are always ordered.

My problem is that my sumranges () are terrible. I hate looking at him. Currently, I just iterate over the tuple and each subhead, assigning 1 if the number is not (1 + the previous number) and sums the total. I feel that I am missing a much simpler way to fulfill my stated goal. Does anyone know a more pythonic way to do this?

Edit: I compared all the answers given so far. Thank you all for your answers.

The benchmarking code is as follows using a sample size of 100K:

 from time import time from random import randrange nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for j in range(100000)] for func in sumranges, alex, matt, redglyph, ephemient, ferdinand: start = time() result = func(nums) end = time() print ', '.join([func.__name__, str(result), str(end - start) + ' s'])

The results are as follows. The actual answer showed that all functions return the correct answer:

 sumranges, 250281, 0.54171204567 s alex, 250281, 0.531121015549 s matt, 250281, 0.843333005905 s redglyph, 250281, 0.366822004318 s ephemient, 250281, 0.805964946747 s ferdinand, 250281, 0.405596971512 s

RedGlyph really ends up at speed, but the simplest answer is probably Ferdinand and probably wins for most pythons.

+7

python

Brent newey Nov 03 '09 at 16:32

source share

7 answers

Consider:

 >>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400)) >>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums] >>> print flat [[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]] >>> import itertools >>> print sum(1 for tu in flat for _ in itertools.groupby(tu)) 7 >>>

we "smooth" the "increasing ramps" by subtracting the index from the value, turning them into successive "runs" of the same values; then we identify and can "run" with the precious itertools.groupby . This seems to be a pretty elegant (and quick) solution to your problem.

+9

Alex martelli Nov 03 '09 at 16:49

source share

Just to show something closer to the source code:

 def sumranges(nums): return sum( (1 for i in nums for j, v in enumerate(i) if j == 0 or v != i[j-1] + 1) )

The idea here was as follows:

Avoid creating staging lists, but use a generator instead, it will save some resources.
Avoid using indexes when you have already selected a subitem (i and v above).

The rest of sum() is still needed in my example.

+7

Redglyph Nov 03 '09 at 16:59

source share

Here is my attempt:

 def ranges(ls): for l in ls: consec = False for (a,b) in zip(l, l[1:]+(None,)): if b == a+1: consec = True if b is not None and b != a+1: consec = False if consec: yield 1 ''' >>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400)) >>> print sum(ranges(nums)) 7 '''

He looks at the numbers in pairs, checking to see if they are a sequential pair (unless it is in the last element of the list). Each time there is a consecutive pair of numbers, it gives 1.

+1

Steven kryskalla Nov 03 '09 at 18:09

source share

This could probably be compiled in a more compact form, but I think that clarity will suffer:

 def pairs(seq): for i in range(1,len(seq)): yield (seq[i-1], seq[i]) def isadjacent(pair): return pair[0]+1 == pair[1] def sumrange(seq): return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)]) def sumranges(nums): return sum([sumrange(seq) for seq in nums]) nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400)) print sumranges(nums) # prints 7

0

Matt anderson Nov 03 '09 at 16:53

source share

Perhaps you could do it better if you had an IntervalSet class , because then you scanned your ranges to build your IntervalSet, and then just use the number of elements in the set.

Some tasks do not always lend themselves to clear code, especially if you need to write code for performance.

0

Michael dillon Nov 03 '09 at 16:54

source share

There is a formula for this sum, the sum of the first n numbers, 1+ 2 + ... + n = n (n + 1) / 2. Then, if you want to get the sum ij, then this is (j (j + 1) / 2) - (i (i + 1) / 2), I am sure it simplifies, but you can fix it. It may not be pythonic, but it is what I will use.

0

Vincent Nov 03 '09 at 17:18

source share

Ferdinand beyer · Accepted Answer · 2009-11-03T17:12:07+0000

My 2 cents:

 >>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums) 7

Basically the same idea as described in Alex 'post , but using set instead of itertools.groupby , which leads to a shorter expression. Since set implemented in C and the len() set is executed in constant time, this should also be pretty fast.

Summing Pythonically Sequential Ranges

More articles: