Calculate the average function of several functions

I have several X / Y Pairs ordered lists, and I want to calculate an X / Y Pairs ordered list representing the average of these lists.

All of these lists (including the β€œmiddle list”) will then be drawn on the chart (see example below).

I have a few problems:

  • Different lists do not have the same number of values
  • The values ​​of X and Y can increase and decrease and increase (etc.) (see example below)

I need to implement this in C #, I think this is not very important for the algorithm itself.

Example of lines

Sorry that I cannot explain my problem in a more formal or mathematical way.

EDIT: I replaced the term β€œfunction” with β€œX / Y Pairs List,” which is less confusing.

+6
c # algorithm
source share
5 answers

I would use the method suggested by Justin, with one adjustment. He suggests using a map table with fractional indices, although I would suggest integer indices. This may seem a little mathematical, but not ashamed to read the next two times (I would have to too). Suppose that the point in the index, I in the list of pairs A, searched for the nearest points in another list B, and the nearest point is indicated in the index j. To find the nearest point in B to A [i + 1], you should only consider points in B with an index equal to or greater than j. It will probably be j + 1, but it can be j or j + 2, j + 3, etc., but not lower than j. Even if the point closest to A [i + 1] has an index smaller than j, you still should not use this point for interpolation, as this will lead to an unexpected average and graph. Now I will take a moment to create a sample code for you. I hope you see that this optimization makes sense.

EDIT: when implementing this, I realized that j is not only bounded from below (as described above), but also bounded from above. When you try the distance from A [i + 1] to B [j], B [j + 1], B [j + 2], etc., you can stop comparing when the distance A [i + 1] to B [j + ...] stops decreasing. It makes no sense to look further in B. The same reasoning applies when j is bounded below: even if some point in another place of B is closer, perhaps this is not the point with which you want to interpolate. This will lead to an unexpected schedule, perhaps less smooth than you expected. And the added bonus of this second binding is improved performance. I created the following code:

IEnumerable<Tuple<double, double>> Average(List<Tuple<double, double>> A, List<Tuple<double, double>> B) { if (A == null || B == null || A.Any(p => p == null) || B.Any(p => p == null)) throw new ArgumentException(); Func<double, double> square = d => d * d;//squares its argument Func<int, int, double> euclidianDistance = (a, b) => Math.Sqrt(square(A[a].Item1 - B[b].Item1) + square(A[a].Item2 - B[b].Item2));//computes the distance from A[first argument] to B[second argument] int previousIndexInB = 0; for (int i = 0; i < A.Count; i++) { double distance = euclidianDistance(i, previousIndexInB);//distance between A[i] and B[j - 1], initially for (int j = previousIndexInB + 1; j < B.Count; j++) { var distance2 = euclidianDistance(i, j);//distance between A[i] and B[j] if (distance2 < distance)//if it closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point. { distance = distance2; previousIndexInB = j; } else { break;//don't place the yield return statement here, because that could go wrong at the end of B. } } yield return LinearInterpolation(A[i], B[previousIndexInB]); } } Tuple<double, double> LinearInterpolation(Tuple<double, double> a, Tuple<double, double> b) { return new Tuple<double, double>((a.Item1 + b.Item1) / 2, (a.Item2 + b.Item2) / 2); } 

For your information, the Average function returns the same amount of interpolated points that List A contains, which is probably good, but you should think about this for your specific application. I added a few comments to clarify some details, and I described all aspects of this code in the text above. I hope this is clear, and otherwise feel free to ask questions.

SECOND EDIT: I misunderstood and thought that you have only two points lists. I have created a generic function of the above multiple list reception. He still uses only those principles that have been described above.

 IEnumerable<Tuple<double, double>> Average(List<List<Tuple<double, double>>> data) { if (data == null || data.Count < 2 || data.Any(list => list == null || list.Any(p => p == null))) throw new ArgumentException(); Func<double, double> square = d => d * d; Func<Tuple<double, double>, Tuple<double, double>, double> euclidianDistance = (a, b) => Math.Sqrt(square(a.Item1 - b.Item1) + square(a.Item2 - b.Item2)); var firstList = data[0]; for (int i = 0; i < firstList.Count; i++) { int[] previousIndices = new int[data.Count];//the indices of points which are closest to the previous point firstList[i - 1]. //(or zero if i == 0). This is kept track of per list, except the first list. var closests = new Tuple<double, double>[data.Count];//an array of points used for caching, of which the average will be yielded. closests[0] = firstList[i]; for (int listIndex = 1; listIndex < data.Count; listIndex++) { var list = data[listIndex]; double distance = euclidianDistance(firstList[i], list[previousIndices[listIndex]]); for (int j = previousIndices[listIndex] + 1; j < list.Count; j++) { var distance2 = euclidianDistance(firstList[i], list[j]); if (distance2 < distance)//if it closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point. { distance = distance2; previousIndices[listIndex] = j; } else { break; } } closests[listIndex] = list[previousIndices[listIndex]]; } yield return new Tuple<double, double>(closests.Select(p => p.Item1).Average(), closests.Select(p => p.Item2).Average()); } } 

In fact, that I made a specific case for the two lists separately, perhaps it was good: it is easy to explain and offers a step to understanding the generalized version. In addition, the square root could be taken out, since it does not change the order of the distances when sorting, just the length.

THIRD CHANGE: In the comments, it became clear that there might be a mistake. I think that there is no, except for the mentioned small error, which should not have any meaning, except at the end of the graphs. As evidence that it really works, this is the result of it (the dashed line is the middle): enter image description here

+4
source share

I will use the metaphor of your functions when cars rush along a lush race track, where you want to extract the center line of the track taking into account the positions of the cars. Each position of the car can be described as a function of time:

 p1(t) = (x1(t), y1(t)) p2(t) = (x2(t), y2(t)) p3(t) = (x3(t), y3(t)) 

The most important problem is that cars are racing at different speeds , which means that p1(10) can be half as fast as a race track like p2(10) . If you took the naive average of these two points, and there was a sharp curve on the track between the cars, then the average may be far from the track.

If you could just transform your functions so that you are no longer a function of time, but a function of the distance along the track , then you could do what you want.

One way to do this is to choose the slowest car (i.e. the one with the largest number of samples). Then, for each sample of the slowest position of the car, look at all the paths of other cars, find the two closest points and select the point on the interpolated line that is closest to the slowest position of the car. Then compare these points together. Once you do this for all slow car samples, you have a middle way.

I assume that all cars start and end at about the same places; if any of the cars just drives a small part of the track, you will need to add some more logic to detect this.


A possible improvement (both for performance and accuracy) is to track the very last sample you use for each car and the speed of each car (relative sampling rate). For your slowest car, this would be a simple map: 1 => 1, 2 => 2, 3 => 3, ... For other cars, this could be more like: 1 => 0.3, 2 => 0, 7, 3 => 1.6 (fractional values ​​due to interpolation). The speed will be the reciprocal of the change in sample number (for example, a slow car will have a speed of 1, and another car will have a speed of 1 / (1.6-0.7) = 1.11). You could guarantee that you would not accidentally step back from any of the machines. You can also improve the speed of calculations, because you do not need to search the entire set of all points on each path; instead, you can assume that the next pattern will be somewhere near the current pattern plus 1 / speed.

+4
source share

Since these are not y=f(x) functions, maybe they are something like (x,y)=f(t) ?

If so, you can interpolate along t and compute avg (x) and avg (y) for each t.

EDIT . This, of course, suggests that t can be made available to your code - so that you have an ordered list of T / X / Y tees.

+2
source share

There are several ways to do this. One of them is to combine all your data into one set of points and make the best shape of the curve through a combined set.

0
source share

you have, for example. 2 "functions" with

 fc1 = { {1,0.3} {2, 0.5} {3, 0.1} } fc1 = { {1,0.1} {2, 0.8} {3, 0.4} } 

You want to have an arithmetic mean (slang: "average") of two functions. To do this, you simply calculate the point arithmetic mean:

 fc3 = { {1, (0.3+0.1)/2} ... } 

Optimization: If you have a large number of points, you must first convert your "ordered X / Y Pairs list" into a matrix OR, at least save the points in a column like this: {0,3, 0,1}, {0, 5, 0.8}, {0.1, 0.4}

0
source share

All Articles