The maximum difference between numbers in a sequence

I need help finding a general idea for an algorithm to solve the following problem. The problem was asked me in the task. It seems like it should be resolved using the greedy method, but I cannot find a simple solution. Here is a description of the problem:

You are given a sequence of N numbers a_1 ... a_n such that 0 = a_1 < a_2 < ... < a_n . You must eliminate no more than M of these numbers in order to maximize the minimum difference a_i+1 - a_i between any two consecutive numbers.

You cannot delete the first and last elements, a_0 and a_n . You should also eliminate as few numbers as possible: if you eliminate M - 1 , you get the shortest distance D and excluding M , you should still have the same minimum difference, you shouldn't eliminate this last number.

I am not asking for a complete solution to this problem. Only a few recommendations on how the algorithm looks.

Edit: Some test samples. Keep in mind that there may be several valid solutions.

 Remove at most 7 from: 0 3 7 10 15 18 26 31 38 44 53 60 61 73 76 80 81 88 93 100 Solution: 0 7 15 26 31 38 44 53 60 73 80 88 93 100 
 Remove at most 8 from: 0 3 7 10 15 26 38 44 53 61 76 80 88 93 100 Solution: 0 15 38 53 76 88 100 
+8
language-agnostic algorithm recurrence
source share
4 answers

[EDIT: I initially claimed that ElKamina's answer was wrong, but now I’ve made sure that it’s not only correct, but almost the same as my (later) answer: -P Still a bit short for my taste, though!]

There exists an exact O (NM ^ 2) -time, O (NM) -space dynamic programming algorithm that receives the correct answer to all OP examples in milliseconds. The main ideas are as follows:

  • Whenever we impose a restriction on the fact that a particular number should not be deleted, it forms a “fence” between two subtasks in such a way that solving each subtask optimally guarantees an optimal solution to the general problem with this restriction in place and
  • Each optimal solution should start with an unused number, followed by a number of consecutive remote numbers, followed by an unused number, followed by an optimal solution for the rest of the problem, which starts with a second undeleted number and uses a reduced M.

Further x [i] means the i-th number in the list with indexing, starting from 0.

Recursion

Let f (i, j) be the optimal (largest minimum) interval size obtained from the suffix of the list of numbers, starting from position 0 <= i <N, holding this (i.e., i-th) number and deleting exactly j later (not necessarily consecutive) numbers. The following recursion shows how this can be calculated:

 f(i, j) = max(g(i, j, d)) over all 0 <= d <= min(j, Ni-2) g(i, j, d) = min(x[i+d+1] - x[i], f(i+d+1, jd)) 

min(j, Ni-2) exists instead of plain j to prevent minus end removal. The only basic cases we need are:

 f(N-1, 0) = infinity (this has the effect of making min(f(N-1), 0), z) = z) f(N-1, j > 0) = 0 (this case only arises if M > N - 2) 

How it works

In more detail, in order to calculate f (i, j), we cycle through all possible numbers (including zero) of consecutive deletions, starting from position i + 1, in each case calculating the minimum (a) the interval formed by this block of deletions, and ( b) the optimal solution to the subtask starting from the first reconstructed number to the right of this block. It is important to indicate that the first number in the block (x [i]) is not deleted, therefore the previous (parent) interval of the subtask is always "limited". This is the difficult part, which took time to understand.

Fast execution

If you copy the simple recursion above, it will work, but it will take exponentially in M ​​and N. Under memoising f (), we guarantee that its body will work no more than N * M times (once for a separate combination of parameters). Each time the function is launched, it performs an O (M) scan of the work through ever longer deletion blocks, for a total time of O (NM ^ 2).

You cannot create a large space using fewer deletions, so the total maximum size of the minimum interval can be found by looking at the results M + 1 f (0, M), f (0, M-1), ..., f (0, 0) for the first number that is less than the previous number: the previous number is the answer, and the second argument f () is the minimum number of deletions required. To find the optimal solution (that is, the List of deleted specific numbers), you can record the decisions made in a separate array of predecessors, so p [i, j] gives the value d (which can be converted to the previous values ​​i and j), which led to the optimal solution for f (i, j). (Perhaps the "predecessor" is confused here: it refers to subtasks that are solved before the current subtask, although these subtasks appear "after" (on the right) the suffix representing the current subtask.) Then, these links can be restored to the decisions made delete / don't- delete

C ++ working code

http://ideone.com/PKfhDv

Addendum: Early Mistakes

With such a complex problem, it can be useful to look at the wrong approaches and see exactly where they went wrong ...: - / I thought I solved the problem, but I did not notice the requirement to return a solution that uses as few deletes as possible, and my initial attempts to fix this did not help.

First, I tried to define f (i, j) as the optimal (largest minimum) interval size obtained from the suffix of the list of numbers, starting from position 0 <= i <N, preserving this (i.e., ith) number and removing from 0 to j later (not necessarily consecutive) numbers. But this caused a subtle problem: it is not necessary that you can assemble the optimal solution for this from the optimal solutions for subtasks. Initially, I thought that this could be fixed by changing the function to return a pair (the size of the interval, the minimum number of deletions needed to achieve the size of this interval) instead of the size of the interval, and breaking the links between actions that share the maximum minimum interval size, always choosing an action that minimizes the number of deletions. But in general, this is not true, because the optimal solution to the subtask (i.e., a certain suffix of the list of numbers) will carry out deletions, making the minimum interval size in this region as large as possible, even if these deletions are wasted because the prefix of the complete solution will cause the total minimum to be lower anyway. Here's a counter example using f () that returns (interval size, minimum number of deletes needed to achieve this size):

 Problem: M = 1, X = [10 15 50 55]. f(2, 0) = (5, 0) (leaving [50 55]) f(1, 1) = (40, 1) (delete 50 to leave [15 55]); *locally* this appears better than not deleting anything, which would leave [15 50 55] and yield a min-gap of 5, even though the latter would be a better choice for the overall problem) f(0, 1) = max(min(5, f(1, 1)), min(40, f(2, 0)) = max(min(5, 40), min(40, 5)) = (5, 1) (leaving either [10 15 55] or [10 50 55]) 

I did not show work for the second element of the pair returned by f (0, 1), because it is hard to express succinctly, but obviously it will be 1, because both alternative attempts need 1 deletion.

+5
source share

Use dynamic programming.

Clue X (i, j) contains the minimum distance with the first i-elements and among them j is selected (that is, not deleted).

This will give you the exact solution. Complexity = O (MN ^ 2), because for each value I have only M valid values ​​of j, and each function call must perform O (M).

Let elements A1, A2, ..., An

Formula to update:

X (i + 1, j + 1) = Max (Min (A (i + 1) -Ak, Xkj) for k <= i)

[Edited by j_random_hacker to add information from comments]

+4
source share

I think I got a solution. At least he is working on two sets of samples. It does not necessarily return the same set as the answer, but the set that it returns has the same minimum value. It is iterative and greedy, which is also nice, and there are tons of ways to optimize it. It looks like MLog (N).

It is important to understand that the numbers do not matter - only the differences between them. When you “delete a number”, you are really just combining two adjacent differences. Then my algorithm will focus on the differences. It is a simple question to go back to what elements caused these differences and remove them when you go.

Algorithm

  • Create an ordered list / array of differences between each number.
  • Find the lowest difference x. If the quantity x> the remaining M, we stop. You are already at best.
  • For each value of x starting with the leftmost one, combine this difference with the next lower one (and delete that x). If the neighbors have equal values, your decision is arbitrary. If only one neighbor has an x ​​value, combine it with another neighbor. (If you have no choice, for example, [1, 1, ...], then in combination with matching X, but avoid it if you can.)
  • Return to step 2 until you finish M.

Algorithm Notes

In step 3, there is a point that I designated as an arbitrary solution. It probably shouldn't be, but you get into quite rare cases when it comes to how many difficulties you want to add. This arbitrariness allows us to solve several different correct answers. If you see two neighbors having the same value, at the moment I am saying randomly choose one. Ideally, you should probably check out a pair of neighbors that are at 2, then 3, etc., and prefer the lower one. I'm not sure what to do if you get to the edge when expanding. Ultimately, to make this part great, you may need to call this function recursively and see which grades are better.

Passing sample data

Second first:

Delete no more than 8 from: 0 3 7 10 15 26 38 44 53 61 76 80 88 93 100

[3, 4, 3, 5, 11, 12, 6, 9, 8, 15, 4, 8, 5, 7] M = 8

Delete 3. Edge deletions can only be added in one direction:

[7, 3, 5, 11, 12, 6, 9, 8, 15, 4, 8, 5, 7] M = 7

[7, 8, 11, 12, 6, 9, 8, 15, 4, 8, 5, 7] M = 6

Then delete 4: [7, 8, 11, 12, 6, 9, 8, 15, 12, 5, 7] M = 5

Then delete 5: [7, 8, 11, 12, 6, 9, 8, 15, 12, 12] M = 4

Then delete 6: [7, 8, 11, 12, 15, 8, 15, 12, 12] M = 3

Then delete 7: [15, 11, 12, 15, 8, 15, 12, 12] M = 2

Then delete 8: [15, 11, 12, 15, 23, 12, 12] M = 1 // note an arbitrary decision about the direction of adding

Finally, delete 11: [15, 23, 15, 23, 12, 12]

Please note that in the answer the lowest difference is 12.

First last

Delete no more than 7 from: 0 3 7 10 15 18 26 31 38 44 53 60 61 73 76 80 81 88 93 100

[3, 4, 3, 5, 3, 8, 5, 7, 6, 9, 7, 1, 12, 3, 4, 1, 7, 5, 7] M = 7

Delete 1:

[3, 4, 3, 5, 3, 8, 5, 7, 6, 9, 8, 12, 3, 4, 1, 7, 5, 7] M = 6

[3, 4, 3, 5, 3, 8, 5, 7, 6, 9, 8, 12, 3, 5, 7, 5, 7] M = 5

There are 4 3 left, so we can remove them:

[7, 3, 5, 3, 8, 5, 7, 6, 9, 8, 12, 3, 5, 7, 5, 7] M = 4

[7, 8, 3, 8, 5, 7, 6, 9, 8, 12, 3, 5, 7, 5, 7] M = 3

[7, 8, 11, 5, 7, 6, 9, 8, 12, 3, 5, 7, 5, 7] M = 2 // Note the arbitrary addition on the right

[7, 8, 11, 5, 7, 6, 9, 8, 12, 8, 5, 7, 5, 7] M = 1

We would delete the 5 following, but only 1 is allowed to be deleted, and there are 3 of them, so we stop here. Our lowest difference is 5, corresponding to the decision.

Note Edited from the idea of ​​combining identical X values ​​to avoid this, for case 1, 29, 30, 31, 59 presented by SauceMaster.

+1
source share

I was hoping not to use an approach using all combinations, but after several attempts it seemed the only way to match my results with j_random_hacker's. (Some of the comments below relate to earlier versions of this answer.) I am impressed with how concisely the j_random_hacker / ElKamina algorithm is expressed in Haskell ('jrhMaxDiff'). Its "compareAllCombos" function looks for differences in the results of our two methods:

 *Main> compareAllCombos 7 4 4 Nothing 


Algorithm:

 1. Group the differences: [0, 6, 11, 13, 22] => [[6],[5],[2],[9]] 2. While enough removals remain to increase the minimum difference, extend the minimum difference to join adjacent groups in all possible ways: [[6],[5],[2],[9]] => [[6],[5,2],[9]] and [[6],[5],[2,9]]...etc. Choose the highest minimum difference and lowest number of removals. 


Haskell Code:

 import Data.List (minimumBy, maximumBy, groupBy, find) import Data.Maybe (fromJust) extendr ind xs = let splitxs = splitAt ind xs (y:ys) = snd splitxs left = snd y right = snd (head ys) in fst splitxs ++ [(sum (left ++ right), left ++ right)] ++ tail ys extendl ind xs = let splitxs = splitAt ind xs (y:ys) = snd splitxs right = snd y left = snd (last $ fst splitxs) in init (fst splitxs) ++ [(sum (left ++ right), left ++ right)] ++ tail (snd splitxs) extend' m xs = let results = map (\x -> (fst . minimumBy (\ab -> compare (fst a) (fst b)) $ x, x)) (solve xs) maxMinDiff = fst . maximumBy (\ab -> compare (fst a) (fst b)) $ results resultsFiltered = filter ((==maxMinDiff) . fst) results in minimumBy (\ab -> compare (sum (map (\x -> length (snd x) - 1) (snd a))) (sum (map (\x -> length (snd x) - 1) (snd b)))) resultsFiltered where solve ys = let removalCount = sum (map (\x -> length (snd x) - 1) ys) lowestElem = minimumBy (\ab -> compare (fst a) (fst b)) ys lowestSum = fst lowestElem lowestSumGrouped = map (\x -> if (fst . head $ x) == 0 then length x else if null (drop 1 x) then 1 else if odd (length x) then div (length x + 1) 2 else div (length x) 2) $ filter ((==lowestSum) . fst . head) (groupBy (\ab -> (fst a) == (fst b)) ys) nextIndices = map snd . filter ((==lowestSum) . fst . fst) $ zip ys [0..] lastInd = length ys - 1 in if sum lowestSumGrouped > m - removalCount || null (drop 1 ys) then [ys] else do nextInd <- nextIndices if nextInd == 0 then solve (extendl (nextInd + 1) ys) else if nextInd == lastInd then solve (extendr (nextInd - 1) ys) else do a <- [extendl nextInd ys, extendr nextInd ys] solve a pureMaxDiff m xs = let differences = map (:[]) $ tail $ zipWith (-) xs ([0] ++ init xs) differencesSummed = zip (map sum differences) differences result = extend' m differencesSummed lowestSum = fst result removalCount = sum (map (\x -> length (snd x) - 1) (snd result)) in if null (filter ((/=0) . fst) differencesSummed) then (0,0) else (removalCount, lowestSum) -- __j_random_hacker stuff begins here -- My algorithm from http://stackoverflow.com/a/15478409/47984. -- Oddly it seems to be much faster when I *don't* try to use memoisation! -- (I don't really understand how memoisation in Haskell works yet...) jrhMaxDiff m xs = fst $ fromJust $ find (\(x, y) -> snd x > snd y) resultPairsDesc where inf = 1000000 n = length xs fij = if i == n - 1 then if j == 0 then inf else 0 else maximum [gijd | d <- [0 .. min j (n - i - 2)]] gijd = min ((xs !! (i + d + 1)) - (xs !! i)) (f (i + d + 1) (j - d)) resultsDesc = map (\i -> (i, f 0 i)) $ reverse [0 .. m] resultPairsDesc = zip resultsDesc (concat [(tail resultsDesc), [(-1, -1)]]) -- All following code is for looking for different results between my and groovy algorithms. -- Generate a list of all length-n lists containing numbers in the range 0 - d. upto 0 _ = [[]] upto nd = concat $ map (\x -> (map (\y -> (x : y)) (upto (n - 1) d))) [0 .. d] -- Generate a list of all length-maxN or shorter lists containing numbers in the range 0 - maxD. generateAllDiffCombos 1 maxD = [[x] | x <- [0 .. maxD]] generateAllDiffCombos maxN maxD = (generateAllDiffCombos (maxN - 1) maxD) ++ (upto maxN maxD) diffsToNums xs = scanl (+) 0 xs generateAllCombos maxN maxD = map diffsToNums $ generateAllDiffCombos maxN maxD -- generateAllCombos causes pureMaxDiff to produce an error with (1, [0, 0]) and (1, [0, 0, 0]) among others, -- so filter these out to look for more "interesting" differences. --generateMostCombos maxN maxD = filter (\x -> length x /= 2) $ generateAllCombos maxN maxD generateMostCombos maxN maxD = filter (\x -> length x > 4) $ generateAllCombos maxN maxD -- Try running both algorithms on every list of length up to maxN having gaps of -- size up to maxD, allowing up to maxDel deletions (this is the M parameter). compareAllCombos maxN maxD maxDel = find (\(x, maxDel, jrh, grv) -> jrh /= grv) $ map (\x -> (x, maxDel, jrhMaxDiff maxDel x, pureMaxDiff maxDel x)) $ generateMostCombos maxN maxD -- show $ map (\x -> (x, jrhMaxDiff maxDel x, pureMaxDiff maxDel x)) $ generateMostCombos maxN maxD 
+1
source share

All Articles