Python Beginner - How to adjust a regression line from clicks and display graphically?

I am reading Python Programming by John Zelle and I am stuck in one exercise shown in the image below.

You can view my code below. I know the code is very ugly. (Any advice is welcome)

Picture of Regression Exercise

Here is my code:

from graphics import * def regression(): # creating the window for the regression line win = GraphWin("Regression Line - Start Clicking!", 500, 500) win.setCoords(0.0, 0.0, 10.0, 10.0) rect = Rectangle(Point(0.5, 0.1), Point(2.5, 2.1)) rect.setFill("red") rect.draw(win) Text(rect.getCenter(), "Done").draw(win) message = Text(Point(5, 0.5), "Click in this screen") message.draw(win) points = [] # list of points n = 0 # count variable sumX = 0 sumY = 0 while True: p = win.getMouse() p.draw(win) # if user clicks in a red square it exits the loop and calculates the regression line if (p.getX() >= 0.5 and p.getX() <= 2.5) and (p.getY() >= 0.1 and p.getY() <= 2.1): break n += 1 # count of the points # get the sum of the X and Y points sumX = sumX + p.getX() sumY = sumY + p.getY() # tuple of the X and Y points dot = (p.getX(), p.getY()) points.append(dot) avgX = sumX / n avgY = sumY / n top = 0 bottom = 0 # my ugly attempt at the regression equation shown in the book for i in points: gp = 0 numer = points[gp][0] * points[gp][1] top = top + numer denom = points[gp][0] ** 2 bottom = bottom + denom gp += 1 m = (top - sumX * sumY) / (bottom - sumX ** 2) y1 = avgY + m * (0.0 - avgX) y2 = avgY + m * (10.0 - avgX) regressionline = Line(Point(0, y1), Point(10.0, y2)) regressionline.draw(win) raw_input("Press <Enter> to quit.") win.close() regression() 

When I run the program, the regression line never seems to be the real line of best fit. I believe that I am misinterpreting the regression equation in my code. What needs to be changed to get the correct regression line?

+7
source share
2 answers

Questions:

  • from my_library import * should be avoided; it’s better to indicate exactly what you want from him. This helps maintain a namespace without spaces.

  • you have one massive block of code; it is better to divide it into separate functions. This greatly facilitates thinking and debugging, and can help you reuse code later. Of course, this is a toy problem, you are not going to reuse it, but all you need to do is develop good habits, and factoring your code in this way is definitely a good habit! A general rule of thumb is that if a function contains more than a dozen lines of code, you should consider its further separation.

  • exercise asks you to track the current amounts of x, y, xx, and xy when you receive input points. I think this is kind of a bad idea - or at least more C-ish than Python-ish, because it forces you to do two different tasks at once (get points and do the math on them). My advice: if you get points, get points; if you do math, do math; do not try to do both at once.

  • Likewise, I don't like how you have a regression calculation, worrying about where the sides of the window are. Why should he know or care about windows? I hope you enjoy my solution :-)

Here is my reorganized version of your code:

 from graphics import GraphWin, Point, Line, Rectangle, Text def draw_window() # create canvas win = GraphWin("Regression Line - Start Clicking!", 500, 500) win.setCoords(0., 0., 10., 10.) # exit button rect = Rectangle(Point(0.5, 0.1), Point(2.5, 2.1)) rect.setFill("red") rect.draw(win) Text(rect.getCenter(), "Done").draw(win) # instructions Text(Point(5., 0.5), "Click in this screen").draw(win) return win def get_points(win): points = [] while True: p = win.getMouse() p.draw(win) # clicked the exit button? px, py = p.getX(), p.getY() if 0.5 <= px <= 2.5 and 0.1 <= py <= 2.1: break else: points.append((px,py)) return points def do_regression(points): num = len(points) x_sum, y_sum, xx_sum, xy_sum = 0., 0., 0., 0. for x,y in points: x_sum += x y_sum += y xx_sum += x*x xy_sum += x*y x_mean, y_mean = x_sum/num, y_sum/num m = (xy_sum - num*x_mean*y_mean) / (xx_sum - num*x_mean*x_mean) def lineFn(xval): return y_mean + m*(xval - x_mean) return lineFn def main(): # set up win = draw_window() points = get_points(win) # show regression line lineFn = do_regression(points) Line( Point(0., lineFn(0. )), Point(10., lineFn(10.)) ).draw(win) # wait to close Text(Point(5., 5.), "Click to exit").draw(win) win.getMouse() win.close() if __name__=="__main__": main() 
+4
source

the for loop is all messed up! you have i that changes in the loop, but then use gp , which is always 0.

you need something more:

 for (X, Y) in points: numer += X * Y denom += X * X 

... or move gp = 0 to the for loop.

... or completely discard this part and add sumXY and sumXX to sumX and sumY .

in any case, as soon as you correct that it should be good (well, or maybe some other mistake ....).

+3
source

All Articles