Linear regression using lm () - surprised by the result

Question

Linear regression using lm () - surprised by the result

I used linear regression for the data I have using the lm function. Everything works (there is no error message), but I am somehow surprised by the result: under my impression R "skips" a group of points, i.e. Interception and tilt are not suitable. For example, I mean a group of points in the coordinates x = 15-25, y = 0-20.

My questions:

Is there a comparison function for matching with “expected” coefficients and “lm-calculated” coefficients?
I made a silly coding mistake by forcing lm to do what?

Following some answers: more info on x and y

x and y are visual estimates of the symptoms of the disease. They have the same uncertainty.

Data and code are here:

 x1=c(24.0,23.9,23.6,21.6,21.0,20.8,22.4,22.6, 21.6,21.2,19.0,19.4,21.1,21.5,21.5,20.1,20.1, 20.1,17.2,18.6,21.5,18.2,23.2,20.4,19.2,22.4, 18.8,17.9,19.1,17.9,19.6,18.1,17.6,17.4,17.5, 17.5,25.2,24.4,25.6,24.3,24.6,24.3,29.4,29.4, 29.1,28.5,27.2,27.9,31.5,31.5,31.5,27.8,31.2, 27.4,28.8,27.9,27.6,26.9,28.0,28.0,33.0,32.0, 34.2,34.0,32.6,30.8) y1=c(100.0,95.5,93.5,100.0,98.5,99.5,34.8, 45.8,47.5,17.4,42.6,63.0,6.9,12.1,30.5, 10.5,14.3,41.1, 2.2,20.0,9.8,3.5,0.5,3.5,5.7, 3.1,19.2,6.4, 1.2, 4.5, 5.7, 3.1,19.2, 6.4, 1.2,4.5,81.5,70.5,91.5,75.0,59.5,73.3,66.5, 47.0,60.5,47.5,33.0,62.5,87.0,86.0,77.0, 86.0,83.0,78.5,83.0,83.5,73.0,69.5,82.5,78.5, 84.0,93.5,83.5,96.5,96.0,97.5) ## x11() plot(x1,y1,xlim=c(0,35),ylim=c(0,100)) # linear regression reg_lin=lm(y1 ~ x1) abline(reg_lin,lty="solid", col="royalblue") text(12.5,25,labels="R result",col="royalblue", cex=0.85) text(12.5,20,labels=bquote(y== .(5.26)*x - .(76)),col="royalblue", cex=0.85) # result I would have imagined abline(a=-150,b=8,lty="dashed", col="red") text(27.5,25,labels="What I think is better",col="red", cex=0.85) text(27.5,20,labels=bquote(y== .(8)*x - .(150)),col="red", cex=0.85)

+8

r linear-regression lm least-squares orthogonal

NOTM Aug 6 '15 at 18:03

source share

2 answers

You already have a good answer, but maybe this is also useful:

As you know, OLS minimizes the sum of squared errors in the y direction. This means that the uncertainty of your x-values is negligible, which often happens. But perhaps this is not the case for your data. Assuming the uncertainties in x and y are equal and the Deming regression, we get a match that is more similar to what you expected.

 library(MethComp) dem_reg <- Deming(x1, y1) abline(dem_reg[1:2], col = "green")

You do not provide detailed information about your data. Thus, it may be useful or not.

+6

Rolling Aug 6 '15 at 18:38

source share

MichaelChirico · Accepted Answer · 2015-08-06T18:10:04+0000

Try the following:

 reg_lin_int <- reg_lin$coefficients[1] reg_lin_slp <- reg_lin$coefficients[2] sum((y1 - (reg_lin_int + reg_lin_slp*x1)) ^ 2) # [1] 39486.33 sum((y1 - (-150 + 8 * x1)) ^ 2) # [1] 55583.18

The sum of the squared residuals below the line lm . This is to be expected, since reg_lin_int and reg_lin_slp guarantee a minimal squared error.

Intuitively, we know that the squared estimates of the loss functions are outlier sensitive. The group is “missing” at the bottom because it approaches the group in the upper left corner, which is much further away - and the square of the distance gives these points more weight.

In fact, if we use the regression of the least absolute deviations (i.e., indicate the absolute loss function instead of the square), the result is much closer to your guess:

 library(quantreg) lad_reg <- rq(y1 ~ x1)

(Tip: use lwd to make your graphs more readable)

Which gets even closer to what you had in mind, Total Least Squares , as mentioned by @nongkrong and @MikeWilliamson. Here is the result of TLS on your example:

 v <- prcomp(cbind(x1, y1))$rotation bbeta <- v[-ncol(v), ncol(v)] / v[1, 1] inter <- mean(y1) - bbeta * mean(x1)

Linear regression using lm () - surprised by the result

More articles: