The difference of two sets of intervals

I am trying to write some code to calculate the difference between two sets of intervals A - B, the interval endpoints are integers, but I'm struggling to find an effective solution, any suggestion would be very appreciated Example: [(1, 4), (7, 9)] - [(3,5)] = [(1, 3), (7, 9)]

Ok, this is the best I've tried so far (two lists are already sorted)

class tp(): def __repr__(self): return '(%.2f,%.2f)' % (self.start, self.end) def __init__(self,start,end): self.start=start self.end=end z=[tp(3,5)] #intervals to be subtracted s=[tp(1, 4)),tp(7, 9), tp(3,4),tp(4,6)] for x in s[:]: if z.end < x.start: break elif z.start < x.start and z.end > x.start and z.end < x.end: x.start=z.end elif z.start < x.start and z.end > x.end: s.remove(x) elif z.start > x.start and z.end < x.end: s.append(tp(x.start,z.start)) s.append(tp(z.end,x.end)) s.remove(x) elif z.start > x.start and z.start < x.end and z.end > x.end: x.end=z.start elif z.start > x.end: continue 
+7
python algorithm intervals set-difference
source share
2 answers

The only way to make the operation efficient is to keep the lists of intervals sorted and non-overlapping (which can be done in O(n log n) ).

If both lists are sorted and do not overlap, any given operation (union, intersection, difference, symmetric difference) can be performed with a simple merge.

The merge operation is simple: simultaneously traverse the ends of both arguments in order. (Note that the endpoints of each interval list are sorted, because we require that the intervals do not overlap.) For each endpoint detected, decide whether it is the result or not. If the result currently has an odd number of endpoints and the new endpoint is not in the result, add it to the result; similarly, if the result currently has an even number of endpoints and the new endpoint is in the result, add it to the result. At the end of this operation, the result is a list of endpoints alternating between the beginning of the interval and the interval.

Here it is in python:

 # If using python 3, uncomment the following: # from functools import reduce # In all of the following, the list of intervals must be sorted and # non-overlapping. We also assume that the intervals are half-open, so # that x is in tp(start, end) iff start <= x and x < end. def flatten(list_of_tps): """Convert a list of intervals to a list of endpoints""" return reduce(lambda ls, ival: ls + [ival.start, ival.end], list_of_tps, []) def unflatten(list_of_endpoints): """Convert a list of endpoints, with an optional terminating sentinel, into a list of intervals""" return [tp(list_of_endpoints[i], list_of_endpoints[i + 1]) for i in range(0, len(list_of_endpoints) - 1, 2)] def merge(a_tps, b_tps, op): """Merge two lists of intervals according to the boolean function op""" a_endpoints = flatten(a_tps) b_endpoints = flatten(b_tps) sentinel = max(a_endpoints[-1], b_endpoints[-1]) + 1 a_endpoints += [sentinel] b_endpoints += [sentinel] a_index = 0 b_index = 0 res = [] scan = min(a_endpoints[0], b_endpoints[0]) while scan < sentinel: in_a = not ((scan < a_endpoints[a_index]) ^ (a_index % 2)) in_b = not ((scan < b_endpoints[b_index]) ^ (b_index % 2)) in_res = op(in_a, in_b) if in_res ^ (len(res) % 2): res += [scan] if scan == a_endpoints[a_index]: a_index += 1 if scan == b_endpoints[b_index]: b_index += 1 scan = min(a_endpoints[a_index], b_endpoints[b_index]) return unflatten(res) def interval_diff(a, b): return merge(a, b, lambda in_a, in_b: in_a and not in_b) def interval_union(a, b): return merge(a, b, lambda in_a, in_b: in_a or in_b) def interval_intersect(a, b): return merge(a, b, lambda in_a, in_b: in_a and in_b) 
+9
source share

This can be solved using a sweep algorithm. The idea is to save all the start points of the intervals from both sets in one sorted array and the end points in another sorted array, marking them with the information that they belong to that set. eg.

  AB [(1, 4), (7, 9)] - [(3,5)] A: start:[1,7] end:[4,9], B: start:[3]end:[5] start:[(1,a),(3,b),(7,a)] end: [(4,a),(5,b),(9,a)] 

Now you have two pointers to the beginning of each array. The increment of the loop is one that indicates the lower intervals of adding values ​​that begin with a character until they end with b or a. for example for above we will repeat the points in this order

 (1,a) (3,b) (4,a) (5,b) (7,a) (9,a) # and adding intervals where we have seen an start a and an end a or b (1,3) (7,9) 

This leads to a linear solution in the number of intervals.

+2
source share

All Articles