Overflow Protection Error

I am using fread in data.table (1.8.8, R 3.0.1), trying to read very large files.

The file in questions has 313 lines and ~ 6.6 million columns of numeric data lines, and the file is about 12 GB. This is Centos 6.4 with 512 GB of RAM.

When I try to read in a file:

g=fread('final.results',header=T,sep=' ') 'header' changed by user from 'auto' to TRUE Error: protect(): protection stack overflow 

I tried to start R with -max-ppsize 500000, which is the maximum, but the same error.

I also tried setting the stack size unlimited through

 ulimit -s unlimited 

Virtual memory has already been installed unlimited.

Am I really unrealistic with a file of this size? Am I missing something obvious enough?

+7
r data.table large-data
source share
1 answer

Now fixed in v1.8.9 on R-Forge.

  • Unexpected limit on 500,000 columns removed in fread . Thanks mpmorley for reporting. Test added.

The reason was that this part was incorrect in the fread.c source:

 // ********************************************************************* // Allocate columns for known nrow // ********************************************************************* ans=PROTECT(allocVector(VECSXP,ncol)); protecti++; setAttrib(ans,R_NamesSymbol,names); for (i=0; i<ncol; i++) { thistype = TypeSxp[ type[i] ]; thiscol = PROTECT(allocVector(thistype,nrow)); // ** HERE ** protecti++; if (type[i]==SXP_INT64) setAttrib(thiscol, R_ClassSymbol, ScalarString(mkChar("integer64"))); SET_TRUELENGTH(thiscol, nrow); SET_VECTOR_ELT(ans,i,thiscol); } 

According to R-exts section 5.9.1 , that PROTECT inside a loop is not required:

In some cases, you need to better monitor whether protection is really needed. Be especially considering situations when a large number of objects are generated. The pointer to the security stack has a fixed size (10,000 by default) and may become full. This is not a good idea. then just PROTECT everything in sight and APPROVE several thousand objects at the end. It will almost always be possible to either assign objects as part of another object (which automatically protects them) or remove protection immediately after use.

So PROTECT is now deleted, and all is well. (It looks like the limit on the pointer's security stack has been reduced to 50,000 since this text was written: Defn.h contains #define R_PPSSIZE 50000L .) I checked all the other PROTECTs in the data.table C file for something like that found and fixed in assign.c, too (if you add more than 50,000 columns by reference), there are no others.

Thank you for message!

+6
source share

All Articles