You can create an empty data frame of size 80,000 x 80,000 as follows:
dat <- do.call(data.frame, replicate(80000, rep(FALSE, 80000), simplify=FALSE)) dim(dat) # [1] 80000 80000 dat[1,1] # [1] FALSE dat[80000,80000] # [1] FALSE
Basically, you create a list containing each column of the data frame that you want to build (I built a list with replicate using simplify=FALSE ), and then you built a data frame from this using do.call and data.frame .
A few notes:
- You need to have several tens of gigabytes of memory in order to be able to install this in your computer's memory (my process R shows 48 GB of allocated memory).
- This will be much slower than matrix distribution; for the case of 8000 x 8000, the construction of the data frame took 36 seconds, and the construction of the matrix took 1 second. It took 54 minutes to fully distribute the data.
- If your data is sparse, this is a wasteful option, and you should use a sparse matrix.
Although the allocation of a matrix of this size was not interrupted during distribution in 64-bit Linux (version R 3.2.0), the basic operations do not work:
x <- matrix(0, nrow=80000, ncol=80000) dim(x) # [1] 80000 80000 x[1,1] # Error: long vectors not supported yet: subset.c:733
josliber
source share