Positioning an ordered sequence of intervals for maximum consistency with another sequence of fixed intervals

I have two interval sequences.

The first is fixed and non-overlapping, so something like:

[1..10], [12..15], [23..56], [72..89], ... 

The second is not fixed, so it is just an ordered list of interval lengths:

 [7, 2, 5, 26, ...] 

The task is as follows:

  • Place each interval from the second list at the given starting point so that the second list becomes a list of fixed, non-overlapping intervals, like the first, while maintaining order
  • Find an alignment that minimizes the number of integers that are in some range from one of the lists, but not in any interval from the other list

A very simple example:

 [25..26], [58..68], [74..76], [78..86] [10, 12] 

The optimal solution is to place an interval of length 10 in [58..68] and an interval of length 12 in [74..86], which leads to the fact that only numbers 25, 26 and 77 are on the same list but not on the other .

The only thing that I came up with is mildly useful: if I set the intervals in order, I know how many β€œfines” for the interval I have already created, so I have an upper bound for the estimate, which means that I have a valid heuristic , and I can do an A * search instead of looking at the whole tree. However, the general range of numbers is from 0 to 34 M, so I would like something better.

Any help would be hot!

+6
source share
1 answer

OK, here is a half-bloated answer. It should work in polynomial time, but I did not bother to check what the index is. You can probably get a better index than the answer described here. Details are left as an exercise for the reader :-) I hope this is not too obscure.

I define the solution score as the number of integers that appear in both lists of intervals. Let f (i, m) be the highest score that can be obtained using only the first length of interval i, provided that none of your intervals exceeds m. The function f for a fixed self is essentially a (not strictly) increasing function from integers to a limited subset of integers. Therefore:

  • all values ​​of f (i, m) for m> 0 are equal with a finite number of exceptions:
  • all values ​​of f (i, m), for m <0, are equal, with a finite number of exceptions.

This means that all f (i, m) values ​​can be represented using a finite data structure (still considering a fixed value of i).

Now let F (i) be the value of this data structure representing all the values ​​of f (i, m). I argue that given F (i), one can compute F (i + 1). To do this, we only need to answer the following question for all x: if I put the new interval at x, how good is the best solution I can get? But we know what it is - it's just f (i, x) + the estimate we got from this interval.

So, if n is the number of intervals in the second list, the estimate of the best solution will be F (n).

To find a solution, you could step back from that.

You know what the best result you can get. Say it s_0. Then place the last interval as far as possible, provided that it allows you to score s_0. That is, we find the smallest m such that f (n, m) = s_0; and position the interval so that it only remains inside the border at point m.

Then let s_1 be the score you need to get from all other intervals to get a total of s_0. Place the next last interval as far as possible, provided that you can still score s_1. That is, we find the smallest m such that f (n, m) = s_1; and position the interval so that it only remains inside the border at point m.

And so on...

+1
source

All Articles