Recording ranges of numbers with dashes

I did not quite know how to ask this question or even search for an answer on Google, but I will write it here. I have a sorted list of integers that correspond to line numbers in a file. I would like to convert them to strings, but for numbers that are sequential, I want the string to have the first number of the sequence, a dash, and then the last number. Here is an example:

line_nums = [ 1, 2, 3, 5, 7, 8, 9, 10 ] 

I want to turn this list into:

 [ '1-3', '5', '7', '8-10' ] 

I wrote code that works for the most part. In some sequences, it will put the same number in a string twice. In the recent execution of this code, the input was:

 [ 10007, 10008, 10009, 10010, 10011, 10013, 10015, 10016, 10017, 10018, 10019 ] 

But I came back:

 [ '10007-10011', '10013-10013', '10015-10019' ] 

Here is my code:

 def get_line_numbers_concat(line_nums): seq = [] final = [] last = 0 for index, val in enumerate(line_nums): if last + 1 == val or index == 0: seq.append(val) last = val else: final.append(str(seq[0]) + '-' + str(seq[len(seq)-1])) seq = [] seq.append(val) last = val if index == len(line_nums) - 1: if len(seq) > 1: final.append(str(seq[0]) + '-' + str(seq[len(seq)-1])) else: final.append(str(seq[0])) final_str = ', '.join(map(str, final)) return final_str 
+5
source share
4 answers

You are almost there, except when seq[0] is actually the same element as seq[len(seq)-1] , which is then simplified in the case of len(seq)==1 or as shown below if len(seq) > 1 , you do normal processing, otherwise JUST add the first element.

 def get_line_numbers_concat(line_nums): seq = [] final = [] last = 0 for index, val in enumerate(line_nums): if last + 1 == val or index == 0: seq.append(val) last = val else: if len(seq) > 1: final.append(str(seq[0]) + '-' + str(seq[len(seq)-1])) else: final.append(str(seq[0])) seq = [] seq.append(val) last = val if index == len(line_nums) - 1: if len(seq) > 1: final.append(str(seq[0]) + '-' + str(seq[len(seq)-1])) else: final.append(str(seq[0])) final_str = ', '.join(map(str, final)) return final_str 
+5
source

Perhaps you could change the order of the code a bit so as not to duplicate it in the latter case, but working with what's there:

Looking at the first if ...,

str(seq[len(seq)-1])) will be equal to str(seq[-1]) for a single-valued sequence that will be the same as str(seq[0]) . I think I give you "10013-10013" .

Try adding if len(seq) > 1: above this too and see if this works in terms of suppressing it. You may also need a similar if / else for what you have below to handle a single-number case.

+2
source

You can use OrderedDict, using the beginning of a new sequence as a key and adding values ​​when you go if the last is equal to the current + 1, and then append the first and last elements of the subscriptions if they are more than one or just add a single element:

 from collections import OrderedDict od = OrderedDict() # create iterator it = iter(l) # get first element to use as starting key key = next(it) od[key] = [key] # keep track of previous element prev = key for i in it: # if last element + 1 is equal to the current # add it to the current sequence if prev + 1 == i: od[key].append(i) else: # else start a new sequence adding key key = i od[key] = [i] # update prev prev = i # if a sublist had len > 1 we have a sequence so join first and last # elements using str.format or else we just extract a single element print(["{}-{}".format(sub[0], sub[-1]) if len(sub) > 1 else str(sub[0]) for sub in od.values()]) ['10007-10011', 10013, '10015-10019'] 

You can use key = l[0] , then for i in l[1:] , but slicing creates a new list, so using iter allows us to get the first element using next , which moves the pointer to the second element, which allows us to extract the first element and just iterate over the rest without slicing.

 In [7]: l = [1,2,3,4] In [8]: it = iter(l) In [9]: next(it) # first element Out[9]: 1 In [10]: next(it) # second element ... Out[10]: 2 In [11]: next(it) Out[11]: 3 In [12]: next(it) Out[12]: 4 

when you iterate over an iter object, it is the same as calling next , so when we remove the first element with next , we iterate over the remainder.

 In [13]: l = [1,2,3,4] In [14]: it = iter(l) In [15]: key = next(it) In [16]: key Out[16]: 1 In [17]: for i in it: ....: print(i) ....: 2 3 4 

You can also do this without a dict by setting the True flag if we have at least two in sequence:

 key, out = next(it), [] prev, flag = key, False for i in it: if prev + 1 == i: flag = True else: # if flag is set we have a sequence else just add the key out.append("{}-{}".format(key, prev) if flag else str(key)) # reset flag flag = False key = i prev = i # catch last element out.append("{}-{}".format(key, prev) if flag else str(key)) 
+2
source

I would like to propose an alternative solution that for me looks much simpler and easier to work with.

This is because it looks exactly like a problem that can be very easily solved using left fold, which is exactly what reduce is in python ( http://en.wikipedia.org/wiki/Fold_%28higher-order_function % 29 )

reduce (function, iterable [, initializer])

Apply the function of two arguments cumulatively to the iteration elements from left to right to reduce iterability to a single value. For example, the abbreviation (lambda x, y: x + y, [1, 2, 3, 4, 5]) computes (((((1 + 2) +3) +4) +5). The left argument x is the accumulated value and the right argument, y, is the update value from the iterable. If there is an optional initializer, it is placed before the iterability elements in the calculation and is used by default when the iterability is empty. If no initializer is specified and the iterability contains only one element, the first element is returned. Approximately equivalent:

Simply put, I would iterable , which would be line_nums one value at a time, using a function that will decide if the value is part of already created sequences or not. So I would end up with a list of lists representing sequences of consecutive numbers. Then I will convert them to a range ( xx-yy ) or just one value ( xx ).

So my solution would look like this:

 def make_sequences(sequences, val): if sequences != [] and sequences[-1][-1] == val - 1: return sequences[:-1] + [sequences[-1] + [val]] return sequences + [[val]] def sequence_to_string(s): return '%s-%s' % (s[0], s[-1]) if len(s) > 1 else str(s[0]) def get_line_numbers_concat(line_nums): return ', '.join( sequence_to_string(seq) '%s-%s' % (seq[0], seq[-1]) for seq in reduce(make_sequences, line_nums, []) ) 

The functions sequence_to_string(..) and get_line_numbers_concat(..) pretty simple, so I’ll just explain what happens inside make_sequences(..) :

 def make_sequences(sequences, val): 

The first time it calls sequences will be [] (since it was passed to reduce in get_line_numbers_concat(..) ), the next time it calls the sequence list, the results of make_sequences(..) will be passed as sequences to subsequent calls to make_sequences(..) . To make this clear, it would line_nums out called up using the original line_nums :

 make_sequences([], 10007) ==> [[10007]] make_sequences([[10007]], 10008) ==> [[10007, 10008]] ... make_sequences([[10007, 10008, 10009, 10010, 10011]], 10013) ==> [[10007, 10008, 10009, 10010, 1011], [10013]] ... 

Then it remains for us to decide if val belongs to the last sequence in sequences :

  if sequences != [] and sequences[-1][-1] == val - 1: # (1) 

This ensures that sequences not empty (otherwise we get an index error), and then we check to see if the last number in the last sequence in the sequence (ie sequences[-1][-1] is equal to val - 1 and therefore, val must be added to this last sequence.

This is done here:

  return sequences[:-1] + [sequences[-1] + [val]] 

where we take all the sequences except the last ( sequences[:-1] ) and add a new sequence for them, which is the result of adding val to the last Sequence.

If, however, condition (1) is not true, which means either the absence of previous sequences ( seqences == [] ) or the last number of the last sequence is not less than val . In this case, we add a new sequence with only one val value:

  return sequences + [[val]] 
0
source

All Articles