I am currently facing some kind of weird problem while saving lists and subscriptions with R. The name may not be explicit, but here's what bothers me:
Given some data (here the data is completely artificial, but the problem is not the relevance of the model):
set.seed(1) a0 = rnorm(10000,10,2) b1 = rnorm(10000,10,2) b2 = rnorm(10000,10,2) b3 = rnorm(10000,10,2) data = data.frame(a0,b1,b2,b3)
And a function that returns a list of complex objects (say lm() objects):
test = function(k){ tt = vector('list',k) for(i in 1:k) tt[[i]] = lm(a0~b1+b2+b3,data = data) tt }
Our test fonction returns a list of lm() objects. Let's see the size of this object:
ok = test(2) object.size(ok) > object.size(ok) 4019336 bytes
Create ok2 , exactly like an object, but not inside the function:
ok2 = vector('list',2) ok2[[1]] = lm(a0~b1+b2+b3,data = data) ok2[[2]] = lm(a0~b1+b2+b3,data = data)
... and check its size:
> object.size(ok2) 4019336 bytes
Here we, ok and ok2 , are exactly the same, and R. tells us so. The problem is if we save these objects on the hard drive as an R object (using save() or saveRDS() ):
save(ok,file='ok.RData') save(ok2,file='ok2.RData')
Their sizes on the hard drive: 3 366 005 bytes and 1 678 851 bytes . ok is 2 times bigger than ok2 , while they are exactly alike!
Even stranger, if you keep a βlistβ of our objects, say ok[[1]] and ok2[[1]] (the objects are again completely identical):
a = ok[[1]] a2 = ok2[[1]] save(a,file='console/a.RData') save(a2,file='console/a2.RData')
Their sizes on the hard drive, respectively: 2 523 284 bytes and 838 977 bytes .
Two things: Why is the size of a different from the size of a2 on the hard drive? Why is ok different from ok2 on the hard drive? And why a , which is exactly half ok sizes 2 523 284 bytes , and ok sizes at 3 366 005 bytes on HD ?.
Did I miss something?
ps: I tested this test under Windows 7 32bits with R 2.15.1, 2.15.2, 2.15.3, 3.0.0 and with debian and R 2.15.1, R 2.15.2. I have this problem every time.
EDIT
thanks to @ user1609452, here is a little trick that seems to work:
test2 = function(k){ tt = vector('list',k) for(i in 1:k){ tt[[i]] = lm(a0~b1+b2+b3,data = data) attr(tt[[i]]$terms,".Environment") = .GlobalEnv attr(attr(tt[[i]]$model,"terms"),".Environment") = .GlobalEnv } tt }
Formula objects come with their own environment and many things. Put it in NULL or in .GlobalEnv and it seems to work. Functions like preview.lm () still work, and our saved objects are the right size on HD. Not sure why.