Why does loading cached objects dramatically increase memory consumption when calculating them?

Question

Why does loading cached objects dramatically increase memory consumption when calculating them?

Relevant Background Information

I created a small software that can be configured through a configuration file. The configuration file is analyzed and converted to the structure of the embedded environment (for example, .HIVE $ db = environment, .HIVE $ db $ user = "Horst",. HIVE $ db $ pw = "my password",. HIVE $ regex $ date = some regex for dates, etc.)

I created routines that can handle these nested environments (for example, look for the value "db / user" or "regex / date", change it, etc.). The fact is that the initial parsing of configuration files takes a lot of time and leads to a rather large volume of the object (actually from three to four, from 4 to 16 MB). So I thought: “No problem, let's just cache them, saving the object in .Rdata files.” This works, but “loading” cached objects causes my Rterm process to go through the roof regarding RAM consumption (more than 1 GB !!), and I still don’t understand why (this does not happen when I “calculate” the object all over again but that’s exactly what I'm trying to avoid, since it takes too much time).

I’ve already thought about maybe serializing it, but I haven’t tested it, since I needed to rework my code a bit. Plus, I'm not sure if this will affect the "download back to R" as well as the loading of .Rdata files.

Question

Can someone tell me why loading a previously computed object has such effects on the memory consumption of my Rterm process (compared to computing it in every new running process) and what is the best way to avoid this?

If desired, I will also try to come up with an example, but it’s a little difficult to reproduce my exact scenario. But I'll try.

+8

performance caching memory r

Rappster Oct 31 '11 at 16:26

source share

2 answers

If it is not reproduced by others, it will be difficult to answer. However, I am doing something very similar to what you are doing, but I am using JSON files to store all my values. Instead of parsing the text, I use RJSONIO to convert everything to a list, and getting material from a list is very simple. (You can convert it to a hash if you want, but it's nice to have layers of nested parameters.)

See this answer for an example of how I did this. If this works for you, you can opt out of the expensive translation and pop-up memory steps.

(Taking a hit on the original question ...) Interestingly, your problem is that you are using a medium, not a list. Saving your environment can be difficult in some contexts. Saving lists is not a problem. Try using the list or try converting to / from the environment. You can use the functions as.list() and as.environment() .

+3

Iterator Oct 31 '11 at 17:19

source share

G. grothendieck · Accepted Answer · 2011-10-31T17:25:10+0000

Probably due to the fact that environments carry around their ancestors. If you do not need information about the ancestors, set the parents of such environments to emptyenv() (or just do not use the environments if you do not need them).

Also note that formulas (and, of course, functions) have environments, so keep an eye on them too.

Why does loading cached objects dramatically increase memory consumption when calculating them?

Relevant Background Information

Question

More articles: