Positioning an ordered sequence of intervals for maximum consistency with another sequence of fixed intervals

Question

Positioning an ordered sequence of intervals for maximum consistency with another sequence of fixed intervals

I have two interval sequences.

The first is fixed and non-overlapping, so something like:

[1..10], [12..15], [23..56], [72..89], ...

The second is not fixed, so it is just an ordered list of interval lengths:

 [7, 2, 5, 26, ...]

The task is as follows:

Place each interval from the second list at the given starting point so that the second list becomes a list of fixed, non-overlapping intervals, like the first, while maintaining order
Find an alignment that minimizes the number of integers that are in some range from one of the lists, but not in any interval from the other list

A very simple example:

 [25..26], [58..68], [74..76], [78..86] [10, 12]

The optimal solution is to place an interval of length 10 in [58..68] and an interval of length 12 in [74..86], which leads to the fact that only numbers 25, 26 and 77 are on the same list but not on the other .

The only thing that I came up with is mildly useful: if I set the intervals in order, I know how many “fines” for the interval I have already created, so I have an upper bound for the estimate, which means that I have a valid heuristic , and I can do an A * search instead of looking at the whole tree. However, the general range of numbers is from 0 to 34 M, so I would like something better.

Any help would be hot!

+6

algorithm intervals

Jonathanr Jan 15 '14 at 20:52

source share

1 answer

David knipe · Accepted Answer · 2014-01-15T23:24:52+0000

OK, here is a half-bloated answer. It should work in polynomial time, but I did not bother to check what the index is. You can probably get a better index than the answer described here. Details are left as an exercise for the reader :-) I hope this is not too obscure.

I define the solution score as the number of integers that appear in both lists of intervals. Let f (i, m) be the highest score that can be obtained using only the first length of interval i, provided that none of your intervals exceeds m. The function f for a fixed self is essentially a (not strictly) increasing function from integers to a limited subset of integers. Therefore:

all values of f (i, m) for m> 0 are equal with a finite number of exceptions:
all values of f (i, m), for m <0, are equal, with a finite number of exceptions.

This means that all f (i, m) values can be represented using a finite data structure (still considering a fixed value of i).

Now let F (i) be the value of this data structure representing all the values of f (i, m). I argue that given F (i), one can compute F (i + 1). To do this, we only need to answer the following question for all x: if I put the new interval at x, how good is the best solution I can get? But we know what it is - it's just f (i, x) + the estimate we got from this interval.

So, if n is the number of intervals in the second list, the estimate of the best solution will be F (n).

To find a solution, you could step back from that.

You know what the best result you can get. Say it s_0. Then place the last interval as far as possible, provided that it allows you to score s_0. That is, we find the smallest m such that f (n, m) = s_0; and position the interval so that it only remains inside the border at point m.

Then let s_1 be the score you need to get from all other intervals to get a total of s_0. Place the next last interval as far as possible, provided that you can still score s_1. That is, we find the smallest m such that f (n, m) = s_1; and position the interval so that it only remains inside the border at point m.

And so on...

Positioning an ordered sequence of intervals for maximum consistency with another sequence of fixed intervals

More articles: