Get t-statistics for the regression coefficients of the "mlm" object returned by `lm ()`

I have used lm() to fit multiple regression models, for multiple (~ 1 million) response variables in R. For example.

 allModels <- lm(t(responseVariablesMatrix) ~ modelMatrix) 

Returns an object of class "mlm", which looks like a huge object containing all the models. I want to get t-statistics for the first coefficient in each model, which I can do using the summary(allModels) function, but very slowly on this big data and also returns a lot of unwanted information.

Is there a faster way to calculate t-statistic manually, which may be faster than using summary() function

Thanks!

+4
source share
1 answer

You can hack the summary.lm () function to get the bits you need and leave the rest.

If you

 nVariables <- 5 nObs <- 15 y <- rnorm(nObs) x <- matrix(rnorm(nVariables*nObs),nrow=nObs) allModels <-lm(y~x) 

Then this is the code from the lm.summary () function, but when deleting all excess baggage (note that all error handling is also deleted).

 p <- allModels$rank rdf <- allModels$df.residual Qr <- allModels$qr n <- NROW(Qr$qr) p1 <- 1L:p r <- allModels$residuals f <- allModels$fitted.values w <- allModels$weights mss <- if (attr(allModels$terms, "intercept")) sum((f - mean(f))^2) else sum(f^2) rss <- sum(r^2) resvar <- rss/rdf R <- chol2inv(Qr$qr[p1, p1, drop = FALSE]) se <- sqrt(diag(R) * resvar) est <- allModels$coefficients[Qr$pivot[p1]] tval <- est/se 

tval now a t-statistic vector that also gives

 summary(allModels)$coefficients[,3] 

If you have problems with a large model, you can rewrite the code so that it reduces the number of objects by combining several lines / assignments into fewer lines.

I understand that this is hacky. But it will be as fast as possible. I believe it would be more neat to put all lines of code in a function.

+1
source

All Articles