R: "long vectors (argument 5) are not supported in .C"

Question

R: "long vectors (argument 5) are not supported in .C"

I have a very large matrix that I am trying to run through glmnet on a server with lots of memory. It works fine even on very large datasets to a certain point, after which I get the following error:

Error in elnet(x, ...) : long vectors (argument 5) are not supported in .C

If I understand correctly, this is caused by a restriction in R, which cannot have a vector with a length longer than INT_MAX. It's right? Are there any solutions available for this that do not require a complete rewrite of glmnet? Does any of the alternative R translators (Riposte, etc.) use this restriction?

Thanks!

+10

vector r scalability bigdata glmnet

Danny Dec 08 '15 at 20:42

source share

2 answers

In ?"long vector" there is a note that says:

However, compiled code usually needs quite extensive changes. Note that the .C and .Fortran interfaces do not accept long vectors, therefore. A class (or similar) should be used.

elnet calls .Fortran calls. You will need to change the function to use .Call , perhaps through the C shell, which calls the FORTRAN code, and perhaps rewrite and compile the corresponding FORTRAN code to handle long vectors.

+2

James Oct 19 '16 at 10:43

source share

Tomas Kalibera · Accepted Answer · 2016-10-19T10:11:12+0000

Since version 3 R supports long vectors. The long vector is indexed by double . A long vector can be the base for a matrix or array larger than 2, since each dimension is small enough to be indexed using an integer . Long vectors cannot be passed to native code via .C and .Fortran . The error message you receive is due to the fact that a long vector is passed through .C .

Long vectors can be passed through .Call . Thus, as long as glmnet native code can support long vectors (64-bit indices) or can be modified / compiled to support it, you only need to change the interface between R and glmnet native code. You can do this manually in C, and there is also a new package for this task called dotCall64 . Part of the interface modification decides when to copy the arguments -.C / .Fortran proactively copies, but you don't want to do this without the need for large data structures.

I think that the difficulty of modifying glmnet's native code to support 64-bit indexes depends on the actual code (which I only looked at, but didn't work). It's easy to switch all integers (either explicitly or implicitly 32-bit integers) in Fortran code to 64-bit. Problems arise when some integers must remain 32 bits, and this will happen, for example. for integer vectors passed from / to the R code, since R uses 32-bit integers (even in long vectors). Glmnet has such integer vectors. How complicated the modification is then depends on how clean the Fortran source code is (for example, if it uses separate integer variables to index and access the values of whole arrays, etc.).

Experimental implementations of R subsets, like Riposte, will not help.

R: "long vectors (argument 5) are not supported in .C"

More articles: