Predict the use of a file output with standard errors

Question

Predict the use of a file output with standard errors

Is there a way to get the forecast behavior with standard errors from lfe::felm if fixed effects are swept away using the projection method in felm ? This question is very similar to the question here , but none of the answers to this question can be used to estimate standard errors or confidence / forecasting intervals. I know there is currently no predict.felm , but I am wondering if there are workarounds like the ones that were linked above that can also work to estimate the prediction interval

 library(DAAG) library(lfe) model1 <- lm(data = cps1, re74 ~ age + nodeg + marr) predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T, interval="prediction")$fit # Result: fit lwr upr # 1 18436.18 2339.335 34533.03 model2 <- felm(data = cps1, re74 ~ age | nodeg + marr) predict(model2, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T, interval="prediction")$fit # Does not work

The goal is to estimate the prediction interval for yhat, for which, I think, I will need to calculate the full variance-covariance matrix (including fixed effects). I could not understand how to do this, and I wonder if it is possible to do this from a computational point of view.

+7

r lfe prediction

pbaylis Feb 06 '18 at 2:15

source share

2 answers

From your first predict(.) Model predict(.) get the following:

 # fit lwr upr # 1 18436.18 2339.335 34533.03

Following 李哲源 , we can also achieve these results manually.

 beta.hat.1 <- coef(model1) # save coefficients # model matrix: age=40, nodeg = 0, marr=1: X.1 <- cbind(1, matrix(c(40, 0, 1), ncol=3)) pred.1 <- as.numeric(X.1 %*% beta.hat.1) # prediction V.1 <- vcov(model1) # save var-cov matrix se2.1 <- unname(rowSums((X.1 %*% V.1) * X.1)) # prediction var alpha.1 <- qt((1-0.95)/2, df = model1$df.residual) # 5 % level pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1) # 95%-CI # [1] 18258.18 18614.18 sigma2.1 <- sum(model1$residuals ^ 2) / model1$df.residual # sigma.sq PI.1 <- pred.1 + c(alpha.1, -alpha.1) * sqrt(se2.1 + sigma2.1) # prediction interval matrix(c(pred.1, PI.1), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr"))) # fit lwr upr # 1 18436.18 2339.335 34533.03

Now your related example applies to multiple FEs, we get the following results:

 lm.model <- lm(data=demeanlist(cps1[, c(8, 2)], list(as.factor(cps1$nodeg), as.factor(cps1$marr))), re74 ~ age) fe <- getfe(model2) predict(lm.model, newdata = data.frame(age = 40)) + fe$effect[fe$idx=="1"] # [1] 15091.75 10115.21

The first value is c, and the second without FE (try fe$effect[fe$idx=="1"] ).

Now we follow the manual approach above.

 beta.hat <- coef(model2) # coefficient x <- 40 # age = 40 pred <- as.numeric(x %*% beta.hat) # prediction V <- model2$vcv # var/cov se2 <- unname(rowSums((x %*% V) * x)) # prediction var alpha <- qt((1-0.95)/2, df = model2$df.residual) # 5% level pred + c(alpha, -alpha) * sqrt(se2) # CI # [1] 9599.733 10630.697 sigma2 <- sum(model2$residuals ^ 2) / model2$df.residual # sigma^2 PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2) # PI matrix(c(pred, PI), nrow = 1, dimnames = list(1, c("fit", "lwr", "upr"))) # output # fit lwr upr # 1 10115.21 -5988.898 26219.33

As we can see, the fit is the same as the associated exemplary approach, but now with a prediction interval. (Disclaimer: the logic of the approach should be simple, PI values should still be evaluated, for example, in Stata with reghdfe .)

Edit: If you want to get exactly the same result from felm() that predict.lm() gives with linear model1 , you just need to re-enable the “included” fixed effects in your model (see model3 below). Just follow the same approach. For greater convenience, you can easily turn it into a function.

 library(DAAG) library(lfe) model3 <- felm(data = cps1, re74 ~ age + nodeg + marr) pv <- c(40, 0, 1) # prediction x-values predict0.felm <- function(mod, pv.=pv) { beta.hat <- coef(mod) # coefficient x <- cbind(1, matrix(pv., ncol=3)) # prediction vector pred <- as.numeric(x %*% beta.hat) # prediction V <- mod[['vcv'] ] # var/cov se2 <- unname(rowSums((x %*% V) * x)) # prediction var alpha <- qt((1-0.95)/2, df = mod[['df.residual']]) # 5% level CI <- structure(pred + c(alpha, -alpha) * sqrt(se2), names=c("CI lwr", "CI upr")) # CI sigma2 <- sum(mod[['residuals']] ^ 2) / mod[['df.residual']] # sigma^2 PI <- pred + c(alpha, -alpha) * sqrt(se2 + sigma2) # PI mx <- matrix(c(pred, PI), nrow = 1, dimnames = list(1, c("PI fit", "PI lwr", "PI upr"))) # output list(CI, mx) } predict0.felm(model3)[[2]] # PI fit PI lwr PI upr # 1 18436.18 2339.335 34533.03

With felm() you can achieve the same prediction interval as with predict.lm() .

+2

jaySf Feb 09 '18 at 12:58

source share

pbaylis · Accepted Answer · 2018-02-15T06:57:22+0000

After talking with several people, I don’t think that you can get an estimate of the distribution yhat = Xb (where X includes both covariates and fixed effects) directly from felm, which boils down to this question. However, it is possible that they load them. The following code does this in parallel. There are opportunities for improving performance, but this gives a general idea.

Note: here I do not calculate the full prediction interval, but only SE on Xb, but getting the prediction interval is simple - just add the root sigma ^ 2 in SE.

 library(DAAG) library(lfe) library(parallel) model1 <- lm(data = cps1, re74 ~ age + nodeg + marr) yhat_lm <- predict(model1, newdata = data.frame(age=40, nodeg = 0, marr=1), se.fit = T) set.seed(42) boot_yhat <- function(b) { print(b) n <- nrow(cps1) boot <- cps1[sample(1:n, n, replace=T),] lm.model <- lm(data=demeanlist(boot[, c("re74", "age")], list(factor(boot$nodeg), factor(boot$marr))), formula = re74 ~ age) fe <- getfe(felm(data = boot, re74 ~ age | nodeg + marr)) bootResult <- predict(lm.model, newdata = data.frame(age = 40)) + fe$effect[fe$fe == "nodeg" & fe$idx==0] + fe$effect[fe$fe == "marr" & fe$idx==1] return(bootResult) } B = 1000 yhats_boot <- mclapply(1:B, boot_yhat) plot(density(rnorm(10000, mean=yhat_lm$fit, sd=yhat_lm$se.fit))) lines(density(yhats), col="red")

Predict the use of a file output with standard errors

More articles: