How does calc.lm () calculate the confidence interval and the prediction interval?

I performed a regression:

CopierDataRegression <- lm(V1~V2, data=CopierData1) 

and my task was to get

  • 90% confidence interval for average response V2=6 and
  • 90% prediction interval when V2=6 .

I used the following code:

 X6 <- data.frame(V2=6) predict(CopierDataRegression, X6, se.fit=TRUE, interval="confidence", level=0.90) predict(CopierDataRegression, X6, se.fit=TRUE, interval="prediction", level=0.90) 

and I got (87.3, 91.9) and (74.5, 104.8) , which seems correct, since the PI must be wider.

The output for both also included se.fit = 1.39 , which was the same. I do not understand what this standard error is. Should the standard error be greater for PI vs CI? How to find these two different standard errors in R? enter image description here


Data:

 CopierData1 <- structure(list(V1 = c(20L, 60L, 46L, 41L, 12L, 137L, 68L, 89L, 4L, 32L, 144L, 156L, 93L, 36L, 72L, 100L, 105L, 131L, 127L, 57L, 66L, 101L, 109L, 74L, 134L, 112L, 18L, 73L, 111L, 96L, 123L, 90L, 20L, 28L, 3L, 57L, 86L, 132L, 112L, 27L, 131L, 34L, 27L, 61L, 77L), V2 = c(2L, 4L, 3L, 2L, 1L, 10L, 5L, 5L, 1L, 2L, 9L, 10L, 6L, 3L, 4L, 8L, 7L, 8L, 10L, 4L, 5L, 7L, 7L, 5L, 9L, 7L, 2L, 5L, 7L, 6L, 8L, 5L, 2L, 2L, 1L, 4L, 5L, 9L, 7L, 1L, 9L, 2L, 2L, 4L, 5L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -45L)) 
+5
r regression linear-regression lm prediction
source share
2 answers

Short answer

 ## no need to specify "interval"; even no need to specify "level" z <- predict(CopierDataRegression, X6, se.fit=TRUE) 

Standard error used for CI:

 se.CI <- z$se.fit # [1] 1.396411 

Standard error used for PI:

 se.PI <- sqrt(z$se.fit^2 + z$residual.scale^2) # [1] 9.022228 

To calculate the confidence / forecast interval at a significance level of 90%, do:

 alpha <- qt((1-0.9)/2, df = z$df) # [1] -1.681071 CI <- z$fit + c(alpha, -alpha) * se.CI # [1] 87.28387 91.97880 PI <- z$fit + c(alpha, -alpha) * se.PI # [1] 74.46433 104.79833 

Longer math answer

When you enter a linear model, the corresponding model is presented in matrix form:

y = X% *% beta.hat

You can get beta.hat from

 beta.hat <- CopierDataRegression$coefficients # (Intercept) V2 # -0.5801567 15.0352480 

and its covariance matrix from

 V <- vcov(CopierDataRegression) # (Intercept) V2 # (Intercept) 7.862086 -1.1927966 # V2 -1.192797 0.2333733 

When we make a prediction with the Xp prediction matrix, we predicted the average value:

 Xp <- model.matrix(~ V2, X6) pred <- as.numeric(Xp %*% beta.hat) # [1] 89.63133 

and forecast variance:

 se2 <- unname(rowSums((Xp %*% V) * Xp)) # [1] 1.949963 

For a 90% confidence interval, we do

 alpha <- qt((1-0.9)/2, df = CopierDataRegression$df.residual) # [1] -1.681071 CI <- pred + c(alpha, -alpha) * sqrt(se2) # [1] 87.28387 91.97880 

The prediction interval is a wider interval since it additionally takes into account sigma2 noise sigma2 . Pearson's sigma2 is equal to

 sigma2 <- sum(CopierDataRegression$residuals ^ 2) / CopierDataRegression$df.residual # [1] 79.45063 

Thus, a 90% prediction interval:

 PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2) # [1] 74.46433 104.79833 

At the end, z <- predict(CopierDataRegression, X6, se.fit=TRUE) returns

  • z$fit : pred higher;
  • z$se.fit : sqrt(se2) above;
  • z$df : df.residuals above;
  • z$residual.scale : sqrt(sigma2) above.
+17
source share

I don't know if there is a quick way to extract the standard error for the prediction interval, but you can always cancel the intervals for SE (although this is not a very elegant approach):

 m <- lm(V1 ~ V2, data = d) newdat <- data.frame(V2=6) tcrit <- qt(0.95, m$df.residual) a <- predict(m, newdat, interval="confidence", level=0.90) cat("CI SE", (a[1, "upr"] - a[1, "fit"]) / tcrit, "\n") b <- predict(m, newdat, interval="prediction", level=0.90) cat("PI SE", (b[1, "upr"] - b[1, "fit"]) / tcrit, "\n") 

Please note that CI SE is the same value from se.fit .

+2
source share

All Articles