The intuitive guess is that the "center" of the optimal sequence will be the arithmetic mean, but this is not so. Find the right solution with some vector math:
Part 1: Assuming that the first number will be left alone (we will consider this assumption later), calculate the differences, so 1 12 3 14 5 16 - 1 2 3 4 5 6 will give 0 -10 0 -10 0 -10 .
sidenote: Note that an “adjacent” array, by your implied definition, will be an increasing arithmetic sequence with a difference of 1. (Note that there are other reasonable interpretations of your question: some people may consider 5 4 3 2 1 adjacent or 5 3 1 be adjacent or 1 2 3 2 3 be adjacent. You also did not indicate whether negative numbers should be treated differently.)
: Adjacent numbers must lie between the minimum and maximum numbers. [proof left to reader]
Part 2: Now back to our example, assuming that we have completed 30 steps (sum (abs ( 0 -10 0 -10 0 -10 )) = 30) to turn 1 12 3 14 5 16 into 1 2 3 4 5 6 . This is one correct answer. But 0 -10 0 -10 0 -10 + c is also an answer that gives an arithmetic sequence of difference 1 for any constant c. In order to minimize the number of “steps”, we must choose the appropriate c. In this case, every time we increase or decrease c, we increase the number of steps by N = 6 (vector length). So, for example, if we wanted to turn our original sequence 1 12 3 14 5 16 into 3 4 5 6 7 8 (c = 2), then the differences would be 2 -8 2 -8 2 -8 and sum(abs(2 -8 2 -8 2 -8))=30 .
Now it is very clear if you can visualize it, but it is difficult to type in the text. First we took our difference vector. Imagine you drew it like this:
4| 3| * 2| * | 1| | | * 0+--+--+--+--+--* -1| | -2| *
We can “shift” this vector up and down by adding or subtracting 1 from everything. (This is equivalent to finding c.) We want to find a shift that minimizes the number | that you see (the area between the curve and the x axis). This is NOT average (this will be the minimization of the standard deviation or RMS error , not the absolute error). To find the minimization of c, we consider this as a function and consider its derivative. If the differences are all far from the x axis (we are trying to make 101 112 103 114 105 116 ), it makes sense just not to add this additional material, so we shift the function down along the x axis. Each time we decrease c, we improve the solution by 6. Now suppose that one of * passes the x axis. Each time we decrease c, we improve the solution by 5-1 = 4 (we save 5 work steps, but must perform 1 additional step for * below the x axis). In the end, when HALF * passes along the x axis, we SHOULD NOT IMPROVE THE SOLUTION (derivative: 3-3 = 0). (In fact, soon we begin to make the decision worse and never improve it again. We not only found the minimum of this function, but also see that this is a global minimum.)
So the solution is this: Pretend the first number in place. Calculate the vector of differences. Minimize the sum of the absolute value of this vector; do this by finding the median DIFFERENCES and subtracting it from the differences to get an improved vector of differences. The sum of the absolute value of the "improved" vector is your answer. This is O(N) . Equal optimality solutions will (as indicated above) always be “adjoining”. The only solution exists only if there is an odd number of numbers; otherwise, if there is an even number of numbers, and the median difference is not an integer, then in equally optimal solutions there will be difference vectors with correcting factors of any number between the two medians.
So, I think that this would not be complete without a final example.
- input:
2 3 4 10 14 14 15 100 - vector of differences:
2 3 4 5 6 7 8 9 - 2 3 4 10 14 14 15 100 = 0 0 0 -5 -8 -7 -7 -91 - note that the medians of the difference vector are no longer in the middle, we need to run the search algorithm median
O(N) to extract them ... - the medians of the difference vector
-5 and -7 - take -5 as our correction factor (any number between medians, such as -6 or -7, would also be the right choice)
- Thus, our new goal is
2 3 4 5 6 7 8 9 + 5 = 7 8 9 10 11 12 13 14 , and the new differences are 5 5 5 0 -3 -2 -2 -86 * - this means that we need to perform
5+5+5+0+3+2+2+86 = 108 steps
* (we get this by repeating step 2 with our new goal or adding 5 to each number of the previous difference ... but since you only care about the amount, we just add 8 * 5 (vector length multiplies the correct coefficient) by the previously calculated amount )
As an alternative, we could also take -6 or -7 as our correction factor. Say we took -7 ...
- then the new goal would be
2 3 4 5 6 7 8 9 + 7 = 9 10 11 12 13 14 15 16 and the new differences would be 7 7 7 2 1 0 0 -84 - this would mean that we would need to do 7 + 7 + 7 + 2 + 1 + 0 + 0 + 84 = 108 steps, the same as above
If you yourself imitate this, you can see that the number of steps becomes> 108, as we make offsets further from the range [-5, -7].
pseudo code:
def minSteps(array A of size N): A' = [0,1,...,N-1] diffs = A'-A medianOfDiffs = leftMedian(diffs) return sum(abs(diffs-medianOfDiffs))
Python:
leftMedian = lambda x:sorted(x)[len(x)//2] def minSteps(array): target = range(len(array)) diffs = [ta for t,a in zip(target,array)] medianOfDiffs = leftMedian(diffs) return sum(abs(d-medianOfDiffs) for d in diffs)
change
It turns out that for arrays of different integers this is equivalent to a simpler solution : choosing one of (up to 2) medians, assuming that they do not move, and accordingly move the other numbers. This simple method often gives incorrect answers if you have duplicates, but the OP did not ask about it, so this would be a simpler and more elegant solution. In addition, we can use the proof that I gave in this decision to justify the “suppose that the median is not moving” solution as follows: the correction factor will always be in the center of the array (that is, the median of differences will be from the median of numbers). Thus, any restriction that also guarantees this can be used to create variations of this brainteaser.