R: Any other solution “cannot allocate the vector size n mb” in R?

My problem is simple calculations on large data sets (about 25 million rows and 10 columns, i.e. aroung 1GB data). My system:

32bits/Windows7/4Gb Ram/R Studio 0.96, R 2.15.2 

I can link to my database using the BigMemory package. And use the functions above my db. I can also do this with the ff package, filehash, etc.

The problem is the calculation of simple calculations (in the form of unique values, means, etc.). I have a typical problem

"cannot select vector size n mb"

where n can be from 70 to 95 mb in size, etc.

I know of all the (I think) solutions provided so far about this:

 increase RAM. launch R with inline code "--max-mem-size XXXX", use memory.limit() and memory-size() commands, use rm() and gc(), work on 64bit, close other programs, free memory, reboot, use packages bigmemory, ff, filehash, sql, etc etc. improve your data, use integers, shorts, etc. ... check memory usage of intermediate calculations, ... etc. 

All this has been checked, completed (with the exception of switching to another system / machine, kindly), etc.

But I still get that they "cannot allocate a vector size of n mb", where n is about 90 mb, for example, practically without using memory from R or other programs, all this restarts, updates ... I know the differences between free memory and distribution from windows and R etc. but

This makes no sense because the available memory is more than 3 GB. I suspect the reason is that it is related to Windows32b-R memory management, but it seems like it's almost a joke to buy 4 GB of RAM or switch the entire system to 64 bits to allocate 70 MB.

Is there something I am missing?

+7
source share
2 answers

The problem is that R is trying to allocate 90 mb of continuous space. Unfortunately, after many operations, it is possible that the memory is too fragmented.

If possible, try optimizing your code to use small chunks of data at a time.

If you are trying to perform simple calculations, for example, those that you mentioned (for example, max lines, etc.), you can try using biganalytics , which allows you to perform a number of operations on big.matrix objects.

Otherwise, as far as I know, there isn’t so much before moving to a 64-bit OS and 64-bit R.

+4
source

look at ff package in CRAN. These are R “tricks” by distributing data to a memory slot in a fixed file instead of using RAM. It works pretty well with data import. You can also use the ffbase package to perform simple, efficient calculations for ff objects.

+2
source

All Articles