How does calc.lm () calculate the confidence interval and the prediction interval?

Question

How does calc.lm () calculate the confidence interval and the prediction interval?

I performed a regression:

CopierDataRegression <- lm(V1~V2, data=CopierData1)

and my task was to get

90% confidence interval for average response V2=6 and
90% prediction interval when V2=6 .

I used the following code:

 X6 <- data.frame(V2=6) predict(CopierDataRegression, X6, se.fit=TRUE, interval="confidence", level=0.90) predict(CopierDataRegression, X6, se.fit=TRUE, interval="prediction", level=0.90)

and I got (87.3, 91.9) and (74.5, 104.8) , which seems correct, since the PI must be wider.

The output for both also included se.fit = 1.39 , which was the same. I do not understand what this standard error is. Should the standard error be greater for PI vs CI? How to find these two different standard errors in R?

Data:

 CopierData1 <- structure(list(V1 = c(20L, 60L, 46L, 41L, 12L, 137L, 68L, 89L, 4L, 32L, 144L, 156L, 93L, 36L, 72L, 100L, 105L, 131L, 127L, 57L, 66L, 101L, 109L, 74L, 134L, 112L, 18L, 73L, 111L, 96L, 123L, 90L, 20L, 28L, 3L, 57L, 86L, 132L, 112L, 27L, 131L, 34L, 27L, 61L, 77L), V2 = c(2L, 4L, 3L, 2L, 1L, 10L, 5L, 5L, 1L, 2L, 9L, 10L, 6L, 3L, 4L, 8L, 7L, 8L, 10L, 4L, 5L, 7L, 7L, 5L, 9L, 7L, 2L, 5L, 7L, 6L, 8L, 5L, 2L, 2L, 1L, 4L, 5L, 9L, 7L, 1L, 9L, 2L, 2L, 4L, 5L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -45L))

+5

r regression linear-regression lm prediction

Mitty Jun 29 '16 at 20:30

source share

2 answers

I don't know if there is a quick way to extract the standard error for the prediction interval, but you can always cancel the intervals for SE (although this is not a very elegant approach):

 m <- lm(V1 ~ V2, data = d) newdat <- data.frame(V2=6) tcrit <- qt(0.95, m$df.residual) a <- predict(m, newdat, interval="confidence", level=0.90) cat("CI SE", (a[1, "upr"] - a[1, "fit"]) / tcrit, "\n") b <- predict(m, newdat, interval="prediction", level=0.90) cat("PI SE", (b[1, "upr"] - b[1, "fit"]) / tcrit, "\n")

Please note that CI SE is the same value from se.fit .

+2

MAB Jun 29 '16 at 21:32

source share

李哲源 · Accepted Answer · 2016-06-29T21:33:00+0000

Short answer

 ## no need to specify "interval"; even no need to specify "level" z <- predict(CopierDataRegression, X6, se.fit=TRUE)

Standard error used for CI:

 se.CI <- z$se.fit # [1] 1.396411

Standard error used for PI:

 se.PI <- sqrt(z$se.fit^2 + z$residual.scale^2) # [1] 9.022228

To calculate the confidence / forecast interval at a significance level of 90%, do:

 alpha <- qt((1-0.9)/2, df = z$df) # [1] -1.681071 CI <- z$fit + c(alpha, -alpha) * se.CI # [1] 87.28387 91.97880 PI <- z$fit + c(alpha, -alpha) * se.PI # [1] 74.46433 104.79833

Longer math answer

When you enter a linear model, the corresponding model is presented in matrix form:

y = X% *% beta.hat

You can get beta.hat from

 beta.hat <- CopierDataRegression$coefficients # (Intercept) V2 # -0.5801567 15.0352480

and its covariance matrix from

 V <- vcov(CopierDataRegression) # (Intercept) V2 # (Intercept) 7.862086 -1.1927966 # V2 -1.192797 0.2333733

When we make a prediction with the Xp prediction matrix, we predicted the average value:

 Xp <- model.matrix(~ V2, X6) pred <- as.numeric(Xp %*% beta.hat) # [1] 89.63133

and forecast variance:

 se2 <- unname(rowSums((Xp %*% V) * Xp)) # [1] 1.949963

For a 90% confidence interval, we do

 alpha <- qt((1-0.9)/2, df = CopierDataRegression$df.residual) # [1] -1.681071 CI <- pred + c(alpha, -alpha) * sqrt(se2) # [1] 87.28387 91.97880

The prediction interval is a wider interval since it additionally takes into account sigma2 noise sigma2 . Pearson's sigma2 is equal to

 sigma2 <- sum(CopierDataRegression$residuals ^ 2) / CopierDataRegression$df.residual # [1] 79.45063

Thus, a 90% prediction interval:

 PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2) # [1] 74.46433 104.79833

At the end, z <- predict(CopierDataRegression, X6, se.fit=TRUE) returns

z$fit : pred higher;
z$se.fit : sqrt(se2) above;
z$df : df.residuals above;
z$residual.scale : sqrt(sigma2) above.

How does calc.lm () calculate the confidence interval and the prediction interval?

More articles: