I have 32 GB of RAM on this machine, but I can get R to kill faster than anyone;)
Example
The goal here is to achieve rbind() two data.tables using functions that use the efficiency of data.table.
input:
rm(list=ls()) gc()
exit:
used (Mb) gc trigger (Mb) max used (Mb) Ncells 1604987 85.8 2403845 128.4 2251281 120.3 Vcells 3019405 23.1 537019062 4097.2 468553954 3574.8
input:
tmp.table <- data.table(X1=sample(1:7,4096000,replace=TRUE), X2=as.factor(sample(1:2,4096000,replace=TRUE)), X3=sample(1:1000,4096000,replace=TRUE), X4=sample(1:256,4096000,replace=TRUE), X5=sample(1:16,4096000,replace=TRUE), X6=rnorm(4096000)) setkey(tmp.table,X1,X2,X3,X4,X5,X6) join.table <- data.table(X1 = integer(), X2 = factor(), X3 = integer(), X4=integer(), X5 = integer(), X6 = numeric()) setkey(join.table,X1,X2,X3,X4,X5,X6) tables()
exit:
NAME NROW MB COLS KEY [1,] join.table 0 1 X1,X2,X3,X4,X5,X6 X1,X2,X3,X4,X5,X6 [2,] tmp.table 4,096,000 110 X1,X2,X3,X4,X5,X6 X1,X2,X3,X4,X5,X6 Total: 111MB
input:
join.table <- merge(join.table,tmp.table,all.y=TRUE)
exit:
Ha! Nope. RStudio restarts the session.
Question
What's going on here? Explicitly defining factor levels in join.table had no effect. rbind() instead of merge() didn't help - exactly the same behavior. I did a few more complex and cumbersome things related to this data without any problems.
version information
$platform [1] "x86_64-pc-linux-gnu" $arch [1] "x86_64" $os [1] "linux-gnu" $system [1] "x86_64, linux-gnu" $version.string [1] "R version 3.0.2 (2013-09-25)" $nickname [1] "Frisbee Sailing" > rstudio::versionInfo() $version [1] '99.9.9' $mode [1] "server"
Data.table - version 1.8.11.