The first column that prints when using pretty.gbm.tree is row.names , which is assigned in the pretty.gbm.tree.R script. In the script, row.names is assigned as row.names(temp) <- 0:(nrow(temp)-1) , where temp is the tree information stored in the form data.frame . The correct way to interpret row.names is to read it as node_id with the root of the node that is set to 0.
In your example:
Id SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction 0 9 6.250000e+01 1 2 21 0.6634681 5981 0.005000061
means that the root of the node (indicated by line number 0) is split into the 9th separation variable (the numbering of the split variable starts at 0 here, so the split variable is the 10th column in set x training). SplitCodePred of 6.25 means that all points less than 6.25 went to LeftNode 1 , and all points greater than 6.25 went to RightNode 2 . All points with a missing value in this column were bound to MissingNode 21 . ErrorReduction was 0.6634 due to this separation and at the root of the node was 5981 ( Weight ). Prediction of 0.005 denotes the value assigned to all values ββin this node, before the point was split. In the case of terminal nodes (or leaves) indicated by -1 in SplitVar , LeftNode , RightNode and MissingNode , Prediction denotes the value predicted for all points belonging to this leaf node adjusted (times) times shrinkage .
To understand the structure of the tree, it is important to note that the splitting of the tree occurs in the depths of the first mode. Therefore, when the root of a node (with node id 0) is divided into its left node and right node, the left side is processed until no further splits occur before returning, and return to the correct node value. On both trees in your example, RightNode gets the value 2. This is because in both cases the LeftNode turns out to be a leaf node.
source share