R version 2.15.0 (2012-03-30) RStudio 0.96.316 Win XP, latest update
I have a dataset with 40 variables and 15,000 observations. I would like to use bestglm to find possible good models (logistic regression). I tried bestglm, however it does not work for such a medium sized dataset. After several tests, I think that bestglm fails if there are more than 30 variables on my computer, at least on my computer (4 GB, dual core).
You can try bestglm limits yourself:
library(bestglm) bestBIC_test <- function(number_of_vars) { # Simulate data frame for logistic regression glm_sample <- as.data.frame(matrix(rnorm(100*number_of_vars), 100)) # Get some 1/0 variable glm_sample[,number_of_vars][glm_sample[,number_of_vars] > mean(glm_sample[,number_of_vars]) ] <- 1 glm_sample[,number_of_vars][glm_sample[,number_of_vars] != 1 ] <- 0 # Try to calculate best model bestBIC <- bestglm(glm_sample, IC="BIC", family=binomial) } # Test bestglm with increasing number of variables bestBIC_test(10) # OK, running bestBIC_test(20) # OK, running bestBIC_test(25) # OK, running bestBIC_test(28) # Error: cannot allocate vector of size 1024.0 Mb bestBIC_test(30) # Error: cannot allocate vector of size 2.0 Gb bestBIC_test(40) # Error in rep(-Inf, 2^p) : invalid 'times' argument
Are there any alternatives that I can use in R to look for possible good models?
source share