Polynomial Regression Predictions

Suppose I want to fit a linear regression model with the degree of the second (orthogonal) polynomial and then predict the answer. Here are the codes for the first model (m1)

x=1:100 y=-2+3*x-5*x^2+rnorm(100) m1=lm(y~poly(x,2)) prd.1=predict(m1,newdata=data.frame(x=105:110)) 

Now try the same model, but instead of using $ poly (x, 2) $, I will use its columns, for example:

 m2=lm(y~poly(x,2)[,1]+poly(x,2)[,2]) prd.2=predict(m2,newdata=data.frame(x=105:110)) 

Let's look at a summary of m1 and m2.

 > summary(m1) Call: lm(formula = y ~ poly(x, 2)) Residuals: Min 1Q Median 3Q Max -2.50347 -0.48752 -0.07085 0.53624 2.96516 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.677e+04 9.912e-02 -169168 <2e-16 *** poly(x, 2)1 -1.449e+05 9.912e-01 -146195 <2e-16 *** poly(x, 2)2 -3.726e+04 9.912e-01 -37588 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9912 on 97 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.139e+10 on 2 and 97 DF, p-value: < 2.2e-16 > summary(m2) Call: lm(formula = y ~ poly(x, 2)[, 1] + poly(x, 2)[, 2]) Residuals: Min 1Q Median 3Q Max -2.50347 -0.48752 -0.07085 0.53624 2.96516 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.677e+04 9.912e-02 -169168 <2e-16 *** poly(x, 2)[, 1] -1.449e+05 9.912e-01 -146195 <2e-16 *** poly(x, 2)[, 2] -3.726e+04 9.912e-01 -37588 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9912 on 97 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.139e+10 on 2 and 97 DF, p-value: < 2.2e-16 

Thus, m1 and m2 are basically the same. Now let's look at the forecasts prd.1 and prd.2

 > prd.1 1 2 3 4 5 6 -54811.60 -55863.58 -56925.56 -57997.54 -59079.52 -60171.50 > prd.2 1 2 3 4 5 6 49505.92 39256.72 16812.28 -17827.42 -64662.35 -123692.53 

Q1: Why is prd.2 significantly different from prd.1?

Q2: How can I get prd.1 using m2 model?

+4
source share
1 answer

m1 is the right way to do this. m2 enters the whole world of pain ...

To make predictions from m2 , the model must know that it has been set to an orthogonal set of basic functions, so that it uses the same basic functions for extrapolated new data values. Compare: poly(1:10,2)[,2] with poly(1:12,2)[,2] - the first ten values ​​do not match. If you explicitly approach the model using poly(x,2) , then predict understands all this and does the right thing.

What you need to do is make sure that your predicted locations are converted using the same set of basic functions that are used to create the model in the first place. You can use predict.poly for this (note that I am invoking my explanatory variables x1 and x2 so that it is easy to match the names up):

 px = poly(x,2) x1 = px[,1] x2 = px[,2] m3 = lm(y~x1+x2) newx = 90:110 pnew = predict(px,newx) # px is the previous poly object, so this calls predict.poly prd.3 = predict(m3, newdata=data.frame(x1=pnew[,1],x2=pnew[,2])) 
+8
source

All Articles