Segmented least squares algorithm, do not understand this concept of dynamic programming at all

Question

Segmented least squares algorithm, do not understand this concept of dynamic programming at all

I have been trying to implement this algorithm in Python for several days. I go back to him all the time and just refuse disappointment. I do not know what's happening. I don’t have anyone to ask or ask for help somewhere so I can come here.

Warning in PDF: http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec10.pdf

I do not think that he was clearly explained, I definitely do not understand.

My understanding of what is going on:

We have many points (x1, y1), (x2, y2) .. and we want to find some lines that best fit this data. We can have several straight lines, and these lines come from the given forums for a and b (y = ax + b).

Now the algorithm starts at the end (?) And assumes that the point p (x_i, y_i) is part of the line segment. Then the notes say that the optimal solution is "the optimal solution for {p1,., Pi-1} plus the (best) line through {pi,., Pn}". Which for me simply means that we go to the point p (x_i, y_i) and go back and find another line segment through the other points. Now the optimal solution is both of these segments.

Then it takes a logical transition, which I cannot follow, and says: "Suppose that the last point pn is part of a segment that starts with p_i. If Opt (j) denotes the cost of the first j points and e (j, k) is an error the best line through the points j to k, then Opt (n) = e (i, n) + C + Opt (i - 1) "

Then there is pseudo code for an algorithm that I do not understand. I understand that we want to iterate over the list of points and find points that minimize the OPT (n) formula, but I just don't follow it. It makes me feel stupid.

I know that this question is a pain in the ass, and that it is not easy to answer, but I'm just looking for some recommendations for understanding this algorithm. I apologize for the PDF, but I don't have a more accurate way to get critical information for the reader.

Thanks for your time and by reading this, I appreciate it.

+7

python algorithm implementation

gurk Nov 03 '10 at 5:28

source share

4 answers

Justin peel · Answer 1 · 2010-11-03T06:11:23+0000

The hard part, the dynamic programming part, is the section

for j = 1 to n M[j] = min_(1=<i=<j) ( e(i,j) + C + M[i-1])

where M[0] = 0 is performed earlier, and n is the total number of data points. In addition, underscores mean that the section in parentheses must subsequently be indexed. The professor could well use OPT instead of M, and this is done in other lectures of other universities about this, which you can find on the Internet (and which look almost identical). Now, if you look at my block of code above and in the lecture, you will notice the difference. I used M[i-1] and your professor used M[j-1] . This is a typo in your professor pseudo-code. You can even look back at the slide earlier and see that it fixed it in the error function.

Basically, for each j you find the point i to draw a line j from such that its error, plus the cost of adding this extra line (C), plus the cost of creating all the segments of the line up to i (which was optimally selected) is minimized.

In addition, remember that e(i,i) = 0 , as well as e(i,i+1) , because setting the line to a point does not give an error, as well as just two points.

Cyclone · Answer 2 · 2010-11-03T07:19:19+0000

Starting from point 1, the best solution up to point j should include a new endpoint j in the last segment of the line, so the problem is where should I place the last split to minimize the cost of this last segment of the line?

Fortunately, the cost is calculated from the point of view of subtasks of the same problem that you are trying to solve, and, fortunately, you have already solved these smaller subtasks by the time you move to the next point j. Thus, at the new point j, you are trying to find the optimal point i between points 1 and j in order to separate a new segment of the line, which includes j, and minimizes the cost: optim_cost_up_to (i) + cost_of_split + cost_of_lsq_fit (i + 1, ). Now the confusing part is that at any moment it may seem that you just find one split, but in fact all the previous splits are determined by optim_cost_up_to (i), and since you have already calculated the optimal cost for all these points leading to j, then you just need to remember the answers so that the algorithm does not digest these costs every time it advances a point.

In python, I would probably use a dictionary to store the results, although it might be better for this dynamic programming algorithm ...

anyway...

  def optimalSolution(points,split_cost) solutions = {0:{'cost':0,'splits':[]}} for j in range(1,len(points)): best_split = None best_cost = lsqFitCost(points,0,j) for i in range(0,j): cost = solutions[i]['cost'] + split_cost + lsqFitCost(points,i+1,j) if cost < best_cost: best_cost = cost best_split = i if best_split != None: solution[j] = {'cost':best_cost,'splits':solution[best_split]['splits']+[best_split]} else: solution[j] = {'cost':best_cost,'splits':[]} return solutions

it is ugly, and I did not check it (maybe there is an error related to the case when a split is not the best value), but I hope this helps you on the right track? Please note that lsqFitCost should do a lot of work on each iteration, but for a line with the least squares like this, you can do it much cheaper by supporting the current amounts used in the calculation ... you should use the least squares method in Google for more information. This can make the lsqFitCost constant, so the total time will be O (N ^ 2).

jon_darkstar · Answer 3 · 2010-11-03T06:10:35+0000

The main premise of dynamic programming is the optimization (lowering the "cost" or, in this case, error) of the recursive task while maintaining the cost of the sub-tasks as you go so that they are not reviewed (this is called memoization).

A bit late, so I won’t be too detailed, but it seems that this problem is your biggest problem with the most basic DP. DP is needed here because of the "segmeneted" quality. As your PDF shows, finding the best line by least squares is simple and does not require DP.

Opt (n) - e (i, n) + C + Opt (i-1) is our cost function, where C is the constant cost of a new line segment (without which the problem is trivial, and we will simply create new line segments for every two points)
e (i, n) - error of points i to n with one segment (not recursive)
Option (i-1) is the least cost recursively from the first point to (i-1) th

Now the algorithm

Make sure that the list of points is sorted by the values x M [0] = 0 --- self-evident
For all pairs (i, j) with i <j: find e (i, j) ---- (this will require nested for / foreach loops or an understanding structure. Save these values in a 2D array!)
For (j = 1..n): M [j] = min ([Opt (j) for i in the range (1, j)]

So basically, find the single line value between any two possible points, store in e
Find the lowest cost to j, for j from 1 to n. The values in the path will help with the subsequent calculation, so save them! Please note that I am also a parameter for Opt. M [n] is the overall optimized cost.

The question for you is How to determine where segmentation occurs? How can you use this and M to define a set of line segments once all over?

Hope this helps!

Kartik kukreja · Answer 4 · 2013-11-27T09:35:30+0000

The least squares problem requires finding the only line most suitable for the given points and has a good closed shape solution. This solution is used as a primitive to solve the problem of segmented least squares.

In the segmented least squares problem, we can have any number of segments to match the given points, and we have to pay the cost for each new segment used. If the cost of using a new segment was 0, we could ideally select all the points by passing a separate line through each point.

Now suppose we are trying to find many segments that are best suited for n given points. If we knew the best solutions for n-1 subtasks: best suited for the first point, best suited for the first 2 points, ..., best suited for the first n-1 points, then we could calculate the best solution for the original problems with n indicate the following:

The nth point lies on one segment, and this segment starts from some early point i, for some i = 1, 2, ..., n. We have already solved all the smaller subtasks, i.e. Having fewer n points, so we can find the best match for n points that minimize:

cost of best fit for first i-1 points (already calculated) + cost of one line that is best suited for points from i to n (using the least squares problem) + cost of using a new segment

The value of i, which minimizes the above value, gives us a solution: use the best fit for the subtask i-1 and the segment through the point i to n.

If you need more help, I wrote an explanation of the algorithm and provided a C ++ implementation here: http://kartikkukreja.wordpress.com/2013/10/21/segmented-least-squares-problem/

Segmented least squares algorithm, do not understand this concept of dynamic programming at all

More articles: