Looking for interval overlap in the interval list?

Question

Looking for interval overlap in the interval list?

Say [a, b] represents the interval on the real line from a to b, a <b, inclusive (that is, [a, b] = the set of all x such that a <= x <= b). Furthermore, let's say [a, b] and [c, d] are “overlapping” if they share any x, so that x is in both [a, b] and [c, d].

Given the list of intervals, ([x1, y1], [x2, y2], ...), what is the most efficient way to find all those intervals that overlap with [x, y]?

Obviously, I can try each one and get it in O (n). But I was wondering if I can sort the list of intervals in some smart way, I could find / one / overlapping element in O (log N) through a binary search, and then “look back” from that position in the list to find everything overlapping intervals. However, how do I sort the intervals so that such a strategy works?

Note that there may be overlap between the elements in the list items, which makes this difficult.

I tried this, sorting the intervals by their left edge, right, middle, but none of them leads to an exhaustive search.

reference

+52

algorithm

cespinoza Dec 15 2018-10-12T00:

source share

7 answers

For completeness, I would like to add that there is a known data structure only for this problem, known (surprise, surprise) as an interval tree . This is basically an extended balanced tree (red-black, AVL, your choice), which stores intervals sorted by their left (bottom) endpoint. The increase is that each node stores the largest right (highest) endpoint in its subtree. This tree allows you to find all overlapping time intervals O (log n).

It is described in CLRS 14.3.

+49

Ben Jan 14 2018-12-12T00:

source share

A 'quadtree' is a data structure that is often used to improve collision detection performance in two dimensions.

I think you could create a similar 1st structure. This will require some preliminary calculation, but should lead to O (log N) performance.

Basically, you start with the root 'node', which covers all possible intervals, and when you add a node to the tree, you decide whether it falls to the left or right of the midpoint. If it crosses the midpoint, you break it into two intervals (but write down the original parent) and recursively come from there. You can set a limit on the depth of the tree, which can save memory and improve performance, but this happens due to the complexity of things (you need to save a list of intervals in your nodes).

Then, checking the interval, you basically find all the leaf nodes into which it will be inserted, whether they were inserted, check the partial intervals inside these nodes for intersection, and then report that the interval written against them as "source" , ancestor.

+4

sje397 Dec 15 '10 at 2:21

source share

Just a quick thought "with a cuff", so to speak.

Could you organize them into 2 lists, one for the beginning of the intervals, and another for the end of the intervals.

That way, you can compare y with the elements at the beginning of the interval list (for example, binary search) to narrow down candidates based on this.

You can then compare x with the elements at the end of the interval list.

EDIT

Case: after shutdown

If you compare only one interval with a list of intervals in a single situation, I do not think that sorting will help you since the ideal sorting is O (n) .

By doing a linear search on all x to trim any impossible intervals, doing another linear search on the remaining y, you can reduce your overall work. Although this is still O (n), without it you would do 2n comparisons, while on average you would do (3n-1) / 2 comparisons this way.

I believe this is the best you can do for an unsorted list.

Case: pre-sorting does not count

If you re-compare individual intervals with this list of intervals and pre-sort the list, you can achieve better results. This process is still applied, but by doing a binary search in the first list, and then you can get O (m log n) as opposed to O (mn), where m is the number of unit intervals being compared. Note that still gives you the advantage of reducing overall comparisons. [2m log n compared to m (3 (log n) - 1) / 2]

+1

Dan McGrath Dec 15 '10 at 2:20

source share

You can sort both left and right at the same time and use both lists to exclude any overlapping values. If the list is sorted by the left end, then none of the intervals to the right of the right end of the test range can overlap. If the list is sorted to the right, then none of the intervals to the left of the left end of the test range can overlap.

For example, if the intervals

[1,4], [3,6], [4,5], [2,8], [5,7], [1,2], [2,2.5]

and you will find a match with [3,4] , then sorting by the left edge and marking the position of the right end of the test (the right end is just greater than its value, so 4 included in the range)

 [1,4], [1,2], [2,2.5], [2,8], [3,6], [4,5], *, [5,7]

you know that [5,7] cannot overlap, then sort by the right edge and mark the position of the left end of the test

 [1,2], [2,2.5], *, [1,4], [4,5], [3,6], [5,7], [2,8]

you know that [1,2] and [2,2.5] cannot intersect

Not sure how effective it will be, since you need to do two kinds and searches.

0

Nemo157 Dec 15 '10 at 2:21

source share

As you can see in other answers, most algorithms are combined with a special data structure. For example, for an unsorted list of intervals as input, O(n) best obtained. (And, as a rule, it is easier to think in terms of the data structure that the algorithm dictates).

In this case, your question is not complete:

Did you get the whole list, or did you actually create it?
Do you need to perform only one such search or many of them?
Do you have any estimates for the operations that it should support, and their frequencies?

For example, if you need to perform only one such search, then you should not sort the list earlier. If there are many, then depreciation will be more expensive sorting or generating "1D quadtree".

However, this would be difficult to solve, because a simple quadria (as I understand it) is able to simply detect collistion, but cannot create a list of all segments that overlap with your input.

One simple implementation will be an ordered (by agreement) list, in which you insert all the ends of the segment from the beginning / end of the flag and with the segment number. Thus, by analyzing it (still O (n), but I doubt that you can do it faster if you also need a list of all segments that overlap) and save the track of all open segments that were not closed with “breakpoints” .

0

ruslik Dec 15 2018-10-12T00:

source share

C ++ program merges overlapping intervals
Uses merge sort
Examples of inversions: (8, 9); (3, 4); (2, 5); (12); (6, 7)
Answer: (1, 5); (6, 7); (8, 9)
Requirement: start <end, i.e. (6, 5), is not a valid interval

See Code

-2

Gulzada Serzhan Apr 17 '15 at 0:25

source share

gdj · Accepted Answer · 2010-12-15 02:23

[a, b] overlaps with [x, y] if b> x and a <y. Sorting the intervals by their first elements gives you the intervals corresponding to the first condition during the log. Sorting the intervals by their last elements gives you the intervals corresponding to the second condition during the log. Take the intersection of the result sets.

Looking for interval overlap in the interval list?

More articles: