How to create a large data frame in R with or without a matrix, and then convert it to data.frame?

I need to create a matrix with 80,000 rows and 80,000 columns . But, after reading on Rbloggers, I found out that the number of elements in the matrix cannot exceed 2 ^ 31 - 1 . My plan to avoid this problem for my particular algorithm is to use a data frame instead of a matrix. Is there a way to create an empty 80000 x 80000 data frame without first creating a matrix, and then convert it to data.frame using as.data.frame , as shown below?

myMatrix <- matrix(0, ncol = 40, nrow = 90) myDataFrame <- as.data.frame(myMatrix) 
+8
matrix r dataframe
source share
1 answer

You can create an empty data frame of size 80,000 x 80,000 as follows:

 dat <- do.call(data.frame, replicate(80000, rep(FALSE, 80000), simplify=FALSE)) dim(dat) # [1] 80000 80000 dat[1,1] # [1] FALSE dat[80000,80000] # [1] FALSE 

Basically, you create a list containing each column of the data frame that you want to build (I built a list with replicate using simplify=FALSE ), and then you built a data frame from this using do.call and data.frame .

A few notes:

  • You need to have several tens of gigabytes of memory in order to be able to install this in your computer's memory (my process R shows 48 GB of allocated memory).
  • This will be much slower than matrix distribution; for the case of 8000 x 8000, the construction of the data frame took 36 seconds, and the construction of the matrix took 1 second. It took 54 minutes to fully distribute the data.
  • If your data is sparse, this is a wasteful option, and you should use a sparse matrix.

Although the allocation of a matrix of this size was not interrupted during distribution in 64-bit Linux (version R 3.2.0), the basic operations do not work:

 x <- matrix(0, nrow=80000, ncol=80000) dim(x) # [1] 80000 80000 x[1,1] # Error: long vectors not supported yet: subset.c:733 
+4
source share

All Articles