I want to run lm() in a large dataset with 50M + observations with two predictors. The analysis is performed on a remote server with a memory capacity of 10 GB for data storage. I tested 'lm ()' on 10K cases selected from the data and the resulting object was 2GB + in size.
I need an object of class "lm" returned from lm() ONLY to create summary statistics of the model ( summary(lm_object) ) and to create forecasts ( predict(lm_object) ).
I conducted an experiment with the options model, x, y, qr lm . If I set them all to FALSE , I will reduce the size by 38%
library(MASS) fit1=lm(medv~lstat,data=Boston) size1 <- object.size(fit1) print(size1, units = "Kb")
but
summary(fit2)
Apparently I need to keep qr=TRUE , which reduces the size of the object by only 9% compared to the default object
fit3=lm(medv~lstat,data=Boston,model=F,x=F,y=F,qr=T) size3 <- object.size(fit3) print(size3, units = "Kb")
How can I bring the size of the object "lm" to a minimum without dumping a large amount of unnecessary information into memory and memory?
memory r lm
Cptnemo
source share