Applying lm () and pred () to multiple columns in a data frame

I have an example dataset below.

train<-data.frame(x1 = c(4,5,6,4,3,5), x2 = c(4,2,4,0,5,4), x3 = c(1,1,1,0,0,1), x4 = c(1,0,1,1,0,0), x5 = c(0,0,0,1,1,1)) 

Suppose I want to create separate models for columns x3 , x4 , x5 based on columns x1 and x2 . for instance

 lm1 <- lm(x3 ~ x1 + x2) lm2 <- lm(x4 ~ x1 + x2) lm3 <- lm(x5 ~ x1 + x2) 

Then I want to take these models and apply them to the test suite using the forecast, and then create a matrix in which each model result will be displayed as a column.

 test <- data.frame(x1 = c(4,3,2,1,5,6), x2 = c(4,2,1,6,8,5)) p1 <- predict(lm1, newdata = test) p2 <- predict(lm2, newdata = test) p3 <- predict(lm3, newdata = test) final <- cbind(p1, p2, p3) 

This is a simplified version where you can do it step by step, the actual data is too large. Is there a way to create a function or use the for statement to combine this in one or two steps?

0
source share
1 answer

I had a tendency to close your question as a duplicate. Installing a linear model with several LHSs , but, unfortunately, the forecasting problem is not considered there. On the other hand, Predicting the linear model object 'mlm' from lm() indicates a forecast, but is a bit far from your situation, since you are working with the interface formula instead of the matrix interface.

I could not find the perfect duplicate target in the "mlm" tag. Therefore, I consider it a good idea to include my answer in this tag. As I said in related questions, predict.mlm does not support se.fit , and at the moment this is also the missing problem in the "mlm" tag. Therefore, I will take this chance to fill such a gap.


Here is the function to get the standard prediction error:

 f <- function (mlmObject, newdata) { ## model formula form <- formula(mlmObject) ## drop response (LHS) form[[2]] <- NULL ## prediction matrix X <- model.matrix(form, newdata) Q <- forwardsolve(t(qr.R(mlmObject$qr)), t(X)) ## unscaled prediction standard error unscaled.se <- sqrt(colSums(Q ^ 2)) ## residual standard error sigma <- sqrt(colSums(residuals(mlmObject) ^ 2) / mlmObject$df.residual) ## scaled prediction standard error tcrossprod(unscaled.se, sigma) } 

In this example you can do

 ## fit an `mlm` fit <- lm(cbind(x3, x4, x5) ~ x1 + x2, data = train) ## prediction (mean only) pred <- predict(fit, newdata = test) # x3 x4 x5 #1 0.555956679 0.38628159 0.60649819 #2 0.003610108 0.47653430 0.95848375 #3 -0.458483755 0.48014440 1.27256318 #4 -0.379061372 -0.03610108 1.35920578 #5 1.288808664 0.12274368 0.17870036 #6 1.389891697 0.46570397 0.01624549 ## prediction error pred.se <- f(fit, newdata = test) # [,1] [,2] [,3] #[1,] 0.1974039 0.3321300 0.2976205 #[2,] 0.3254108 0.5475000 0.4906129 #[3,] 0.5071956 0.8533510 0.7646849 #[4,] 0.6583707 1.1077014 0.9926075 #[5,] 0.5049637 0.8495959 0.7613200 #[6,] 0.3552794 0.5977537 0.5356451 

We can verify f correct:

 ## `lm1`, `lm2` and `lm3` are defined in your question predict(lm1, test, se.fit = TRUE)$se.fit # 1 2 3 4 5 6 #0.1974039 0.3254108 0.5071956 0.6583707 0.5049637 0.3552794 predict(lm2, test, se.fit = TRUE)$se.fit # 1 2 3 4 5 6 #0.3321300 0.5475000 0.8533510 1.1077014 0.8495959 0.5977537 predict(lm3, test, se.fit = TRUE)$se.fit # 1 2 3 4 5 6 #0.2976205 0.4906129 0.7646849 0.9926075 0.7613200 0.5356451 
0
source

All Articles