R: Strange behavior when saving a list () using the save function () from a function

I am currently facing some kind of weird problem while saving lists and subscriptions with R. The name may not be explicit, but here's what bothers me:

Given some data (here the data is completely artificial, but the problem is not the relevance of the model):

set.seed(1) a0 = rnorm(10000,10,2) b1 = rnorm(10000,10,2) b2 = rnorm(10000,10,2) b3 = rnorm(10000,10,2) data = data.frame(a0,b1,b2,b3) 

And a function that returns a list of complex objects (say lm() objects):

 test = function(k){ tt = vector('list',k) for(i in 1:k) tt[[i]] = lm(a0~b1+b2+b3,data = data) tt } 

Our test fonction returns a list of lm() objects. Let's see the size of this object:

 ok = test(2) object.size(ok) > object.size(ok) 4019336 bytes 

Create ok2 , exactly like an object, but not inside the function:

 ok2 = vector('list',2) ok2[[1]] = lm(a0~b1+b2+b3,data = data) ok2[[2]] = lm(a0~b1+b2+b3,data = data) 

... and check its size:

 > object.size(ok2) 4019336 bytes 

Here we, ok and ok2 , are exactly the same, and R. tells us so. The problem is if we save these objects on the hard drive as an R object (using save() or saveRDS() ):

 save(ok,file='ok.RData') save(ok2,file='ok2.RData') 

Their sizes on the hard drive: 3 366 005 bytes and 1 678 851 bytes . ok is 2 times bigger than ok2 , while they are exactly alike!

Even stranger, if you keep a β€œlist” of our objects, say ok[[1]] and ok2[[1]] (the objects are again completely identical):

 a = ok[[1]] a2 = ok2[[1]] save(a,file='console/a.RData') save(a2,file='console/a2.RData') 

Their sizes on the hard drive, respectively: 2 523 284 bytes and 838 977 bytes .

Two things: Why is the size of a different from the size of a2 on the hard drive? Why is ok different from ok2 on the hard drive? And why a , which is exactly half ok sizes 2 523 284 bytes , and ok sizes at 3 366 005 bytes on HD ?.

Did I miss something?

ps: I tested this test under Windows 7 32bits with R 2.15.1, 2.15.2, 2.15.3, 3.0.0 and with debian and R 2.15.1, R 2.15.2. I have this problem every time.

EDIT

thanks to @ user1609452, here is a little trick that seems to work:

 test2 = function(k){ tt = vector('list',k) for(i in 1:k){ tt[[i]] = lm(a0~b1+b2+b3,data = data) attr(tt[[i]]$terms,".Environment") = .GlobalEnv attr(attr(tt[[i]]$model,"terms"),".Environment") = .GlobalEnv } tt } 

Formula objects come with their own environment and many things. Put it in NULL or in .GlobalEnv and it seems to work. Functions like preview.lm () still work, and our saved objects are the right size on HD. Not sure why.

+4
source share
1 answer

take a look

 > attr(ok[[1]]$terms,".Environment") <environment: 0x9bcf3f8> > attr(ok2[[1]]$terms,".Environment") <environment: R_GlobalEnv> 

and

 > ls(envir = attr(ok[[1]]$terms,".Environment")) [1] "i" "k" "tt" 

therefore ok moves around the environment using this function.

Also read ?object.size

  The calculation is of the size of the object, and excludes the space needed to store its name in the symbol table. Associated space (eg the environment of a function and what the pointer in a 'EXTPTRSXP' points to) is not included in the calculation. 

For example, define a test2 and ok3

 test2 = function(k){ tt = vector('list',k) for(i in 1:k) tt[[i]] = lm(a0~b1+b2+b3,data = data) rr = tt tt } ok3 <- test2(2) save(ok3, 'ok3.RdData') > file.info('ok3.RData')$size [1] 5043933 > file.info('ok.RData')$size [1] 3366005 > file.info('ok2.RData')$size [1] 1678851 > ls(envir = attr(ok3[[1]]$terms,".Environment")) [1] "i" "k" "rr" "tt" 

therefore, ok approximately two times larger than ok2 because it has additional tt and ok3 three times larger than tt and rr

 > c(object.size(ok),object.size(ok2),object.size(ok3)) [1] 4019336 4019336 4019336 

There is a related discussion here

+5
source

All Articles