The nodeize parameter is ignored in the randomForest package.

Is the randomForestparameter ignored in the package nodesize? When I predict the end nodes for a dataset and check the counters, I see values ​​that are less than nodesize. I would fix it myself, but the main code was written in Fortran. If someone can confirm this behavior, I will refer to the accompanying package and hope that you will begin the correction.

> library(randomForest)
> set.seed(1)
> rf <- randomForest(mtcars[,-1], mtcars[,1], nodesize = 5)
> nodes <- attr(predict(rf, mtcars[,-1], nodes = TRUE), 'nodes')

# node counts of first tree
> table(nodes[,1])

# first row is the terminal node ID#, second row is the count
 2  6  9 10 11 14 15 16 18 19 
 5  3  3  6  4  2  3  1  3  2 

Adding system information:

Session info----------------------------------------------------------------
 setting  value                       
 version  R version 3.1.1 (2014-07-10)
 system   x86_64, mingw32             
 ui       RStudio (0.98.1049)         
 language (EN)                        
 collate  English_United States.1252  
 tz       America/Chicago             

Packages--------------------------------------------------------------------
 package      * version date       source        
 randomForest * 4.6.10  2014-07-17 CRAN (R 3.1.1)
+4
source share
1 answer

Reply from accompanying package:

This parameter behaves as Leo Braiman suggested. The bug is how the parameter was described. Its the same as minsplitin the function rpart:::rpart.control():

, node,    .

, .

,

0

All Articles