The best way to solve the distance problem

There are many points that are collinear. The problem is to add a new point lying on one line, so that the sum of the distance from the new point to the existing points is minimal. (Assume that the point lies in a horizontal line).

The solution I was thinking about is this:

  • Sorting points according to their x-coordinates (y does not matter in any case).
  • If the odd point is odd, place a new point with the same coordinate as the middle one.
  • Otherwise, place the point in the middle of the n / 2 point and n / 2 + 1 points.

I cannot prove that the method described above works. Is it correct? Also the best way to solve this?

+6
algorithm
source share
7 answers

I believe that the median is the correct answer if we want to minimize the sum of the absolute distances, which is an obvious interpretation. The average value is correct if we want to minimize the sum of the squared distances. For points with y = 0 and x = 0, 1, 2, 101, the average value is 26, and we can assume that the median is 1.5. The sum of the absolute distances from the average is 149, and the sum of the absolute distances from the median is 102.

In the median, the number of points to your left is the number of points to your right. Moving on the left by a small amount increases all the distances to the points on the right and reduces all distances to the points on your left by the same amount - no change. If you are at a point or more from the median, you can reduce the sum of the absolute distances by moving to the area where there are more points. This reduces the sum of the orginationg distances from the area where more points are greater than the sum of the distances originating from the area where there are fewer points. So, if you are not in the median, you can improve the situation by moving to it. This is a pretty standard result in statistics.

+8
source share

I think you can prove it by induction. I will do weird ones and you can expand it:

Without loss of generality, we can say that the points lie along the line y = 0 and that the central point is (0, 0). This is because affine transformations, such as rotations and translations, do not affect relative distances.

Let the set of points on the line be defined as P = {(x, 0) & lt = x real} Define the distance from the point X as the sum (P => | P - X |)

Lemma 1 . The center point should be along the line y = 0. Suppose that the center point is at the point (x, y) with y! = 0. Consider the point (x, 0).

  sum (P - (x, y)) = sum (sqrt ((px) * (px) + (0-y) * (0-y)))
                = sum (sqrt (p * p - 2xp + x * x + y * y))
                > sum (sqrt (p * p - 2xp + x * x + (0-0) * (0-0)))
                = sum (P - (x, 0))

This is a contradiction, therefore y = 0 must be true.

Base case of 1 element . This is an odd number of items, so select it: (0, 0). Suppose that there exists a point X = (x, 0) such that x is closer. Then this means that | x - 0 | <(0 - 0) or what | x | <0, which is impossible. Therefore (0, 0) is the central point.

A base register of three elements . This is an odd number of elements, so choose a midpoint: (0, 0). Without loss of generality, let two other points (a <0, 0) and (b> 0, 0). Suppose there is a point X = (x, 0) that is closer. Then this means that:

| x - 0 | + | x - a | + | x - b | <| 0 - 0 | + | 0 - a | + | 0 - b |

<=>

| x | + | x - a | + | x - b | <| | + | b |

But:

| x | + | x - a | + | x - b | > = | x | + | a | + | b | > = | a | + | b |, which contradicts the assumption, therefore (0, 0) is the central point.

The case with N elements (N odd) . Suppose that all odd sets of points satisfy the above conditions. Let P be a set with N elements and arrange them as follows:

{(a, 0), Q = {the set of N-2 elements centered at (0, 0)}, (b, 0)}

Suppose that the center point is X = (x, 0).

  sum (P - X) = | xa |  + | xb |  + sum (Q - X)
            > | xa |  + | xb |  + sum (Q - (0,0))
            > = | a |  + | b |  + sum (Q - (0,0))
            = sum (P - (0,0))

This means that the assumption is contrary, therefore (0,0) must be the center point.

This proves this for all odd numbers. Even numbers should be the same.

+3
source share

This is not the best solution. But gives the correct meaning.

  • First find the angle of your line, rotate all the points to the opposite angle to become a horizontal line.
  • Sort all X-points since Y will be constant after rotation
  • Let it be X (1), X (2), X (3), ... X (N)
  • Then save the Calculated distance D (R) for each R from 1 to N as [(2R-N) * X (R)] - [X (1) + X (2) + X (3) .. + X (R )] + [X (R + 1) + X (R + 2) + X (R + 3) ... + X (N)]
  • Get minimun D (R).
  • Turn back X (R), Y to the starting angle.
  • This is your expected value.
  • If incase D (R) and D (R + 1) are the same, then all the turning points between X (R), Y and X (R + 1), Y will be your expected value.
  • Interestingly, if R is the average, then the answer will be minimal, since the number of additions [X (1) + .. X (R)] and [X (R + 1) + .. X (N)] are almost equal, then the difference is minimal otherwise, if the number of additives on one side is higher, then the difference will always be greater than the number of equal additions. Similarly, if there is an even number of points, all points between (N) / 2 to (N / 2) +1 will have the same distance.
  • Therefore, MEDIAN is the correct answer.

Hope this works for you.

+1
source share

This is solved by searching for the median. See http://en.wikipedia.org/wiki/Geometric_median (Properties section). It would be enough to carry out calculations on the x or y coordinates. Or you can use it if the coordinates along the selected axis are not constant for the line.

+1
source share

An interesting problem!

Edit: The following is incorrect, but I do not understand where my error is.

The middle solution that you explained in your question is the wrong solution.

Since you want to prove the correctness, I would try to solve this problem as follows:

First of all, since all points are collinear, we can easily divide them into their X- and Y-components. Then we solve the problem independently for X and Y. Suppose we get the values V[0] to V[n-1] , where n is the number of points.

Now the problem becomes computing x, so that SUM( (V[0 ... n-1] - x)^2 ) becomes minimal. We are building the derivative 2*n*x - 2*SUM( V[0 ... n-1] ) .

This will be 0 for -n * x + SUM( V[0 ... n-1] ) . Accordingly, x = SUM( V[0 ... n-1] ) / n .

So, just add all the values ​​and divide them by n to get the right minimum solution. After that, for x and y you get the desired point.

This is also equivalent to the assumption I made when I first thought about your problem, and it works for several values ​​that I tested. Hope this helps. :)

0
source share

Above, mcdowella, FryGuy and ShreevatsaR had ideas in my answer below.

Let n be the number of points on the line. Let the points p (1), ..., p (n) be labeled from left to right.

Case n = 1. True.

Case n = 2. True. You can select any point between two points.

Case n> = 3. We introduce the coordinate system xy. Rotate the plane so that the points are on the x axis.

The point minimizing the distance to other points should be in the interval [p (1), p (n)] due to the following reasoning: if it were to the left of p (1), move it to the right a little (small enough so that x did not hit p (1)), decreasing the distance. Similarly, if it were to the right of p (n).

Select any point x on the line, not necessarily one of p (1), ..., p (n).

Let D be the sum of the distances from x to other points.

Let L be the number of points to the left of x. Similarly, let R be the number of points on the right.

There are three subcases.

Podcast 1: x is the median. So, L = R. If we move x a small amount of e to the left, the sum of the distances becomes D - Le + Re + e = D + e> D. Similarly, to move to the right. Thus, the median gives a local minimum.

Caption 2: x is to the left of the median. As in the next subframe.

Caption 3: x is to the right of the median. Thus, L> = R. There are two points such that x is in (p (i), p (i + 1)] (the left endpoint is excluded, but the right is on). Let e ​​= x - p (i) .

Move x left to e. We can make x become p (i). The sum of the distance becomes D - Le + Re = D - (L - R) e <= D. That is, we may decrease D. We do not increase it.

We continue to move x to the left, possibly decreasing the sum of the distance and not increasing it until x becomes median. Thus, the median gives a global minimum.

0
source share

Well, you certainly don't need to sort them. Just take the average of their x and y coordinates. This will work in N-dimensions if they are collinear.

EDIT: I realized that I am calculating the average. You calculate the median, as indicated in another answer, which is more likely to get the minimum distance to all points.

EDIT2: Median is the correct answer for an odd number of points. For an even number of points, this is any point along the innermost segment of the line, defined by an area with the same number of points on both sides.

Proof (ish): You found the correct answer, but there is a mulium for an even number of points.

For any two points, any point on the line segment between these points will have the sum of its distances so that both points are the same. Any point outside this line segment will have its distances greater than on the line segment.

So, to find the point with the smallest distance from all collinear points, you need to decompose the problem into the contained sets of two points, i.e. line segments that are completely contained in other line segments. Then just take the smallest line segment (or a point in the case of an odd number of points) and select a point on that line segment. All points on this line segment will have a minimum distance from all points for this particular configuration.

If you want to do this, all distances from point graphs will have the same shape: / For a line segment, the distance graph will look like this: _ /

In essence, you want to add all the distance diagrams and find the minimum.

-2
source share

All Articles