Clf.tree_.feature - what is the result? (Scikit learn)

I noticed that scikit-learn clf.tree_.feature sometimes return negative values. For example, -2. As far as I understand, clf.tree_.feature should return a sequential order of functions. If we have an array of function names ['feature_one', 'feature_two', 'feature_three'] , then -2 will refer to feature_two . I am surprised at the use of a negative index. It would be more appropriate to refer to feature_two on index 1. (-2 is a reference, convenient for human digestion, and not for machine processing). Am I reading it right?

Update: Here is an example:

 def leaf_ordering(): X = np.genfromtxt('X.csv', delimiter=',') Y = np.genfromtxt('Y.csv',delimiter=',') dt = DecisionTreeClassifier(min_samples_leaf=10, random_state=99) dt.fit(X, Y) print(dt.tree_.feature) 

Here are the X and Y files

Here is the result:

  [ 8 9 -2 -2 9 4 -2 9 8 -2 -2 0 0 9 9 8 -2 -2 9 -2 -2 6 -2 -2 -2 2 -2 9 8 6 9 -2 -2 -2 8 9 -2 9 6 -2 -2 -2 6 -2 -2 9 -2 6 -2 -2 2 -2 -2] 
+5
source share
2 answers

Reading the Cython source code for the tree generator, we see that -2 are just dummy values ​​for the leaf fragment separation attribute.

Line 63

 TREE_UNDEFINED = -2 

Line 359

 if is_leaf: # Node is not expandable; set node as leaf node.left_child = _TREE_LEAF node.right_child = _TREE_LEAF node.feature = _TREE_UNDEFINED node.threshold = _TREE_UNDEFINED 
+3
source

A bit of additional information for those who stumbled upon this old question just like me.

As the OP writes, clr.tree_.feature returns nodes / leaves in sequential order as a depth search algorithm. First it starts with the root of the node, and then follows the left children until it reaches a leaf (encoded with -2), when it reaches a leaf, it goes up the tree from leaf to leaf until it reaches node. once it reaches node, it again descends into the hierarchy until it reaches the node leaf.

Looking at the OP example, the root of node is function 8, which has a left child, function 9. Then, if we go down the hierarchy, we will immediately reach the leaf node. Therefore, we begin to rise until we reach the non-leaf node. The next node (the right child) is also a leaf node (function 9 of two children is both leaf nodes), and then climbing up the tree, we again get function 9 at the first level of the hierarchy. Here function 9 has a left child 4, which has a leaf node as its left child, then we look at function 4 of the right child, which again is function 9, etc.

+1
source

All Articles