Random forest on a large dataset

I have a large dataset in R (1M + rows by 6 columns) that I want to use to train a random forest (using the package randomForest) for regression purposes. Unfortunately, I get an error Error in matrix(0, n, n) : too many elements specifiedwhen trying to do all this at once and cannot allocate enough memory errors when working in a subset of the data - up to 10,000 or so observations.

Seeing that I can’t add more RAM to my machine, and random forests are very suitable for the type of process I'm trying to simulate, I would really like to do this job.

Any suggestions or workarounds are greatly appreciated.

+5
source share
2 answers

, randomForest , , , : 1 x 1 . , sampsize. , Google, , , , , n,n) .

, , , .

+11

bigrf package R, , ( .. ). , bigrf CRAN, (. : bigrf).

RFs , (. : , R ). , , , RFs, , , , RF (YMMV).

+1

All Articles