How are feature_importances values defined in RandomForestClassifier?

Question

How are feature_importances values defined in RandomForestClassifier?

I have a classification task with a time series as data entry, where each attribute (n = 23) represents a specific point in time. Besides the absolute classification result, I would like to know which attributes / dates contribute to the result. So I just use feature_importances_ , which works well for me.

However, I would like to know how they are calculated and which measurement / algorithm method is used. Unfortunately, I could not find documentation on this topic.

+80

scikit-learn random-forest feature-selection

user2244670 Apr 04 '13 at 11:53 on

source share

6 answers

Gilles Louppe · Answer 1 · 2013-04-04 21:16

There are really several ways to get the value function. As often, there is no strict consensus on what the word means.

In scikit-learn, we implement importance as described in [1] (often cited, but unfortunately rarely read ...). It is sometimes called the “significant value” or “average impurity reduction” and is defined as the total decrease in the impurity of a node (weighted by the probability of reaching this node (which approaches the proportion of samples reaching this node)) averaged over all ensemble trees.

In the literature or in some other packages, you can also find functions implemented as "average reduction accuracy." Basically, the idea is to measure the decrease in accuracy of OOB data when you arbitrarily rearrange the values for this function. If the decline is low, then the function is not important, and vice versa.

(Note that both algorithms are available in the randomForest R. package.)

[1]: Breiman, Friedman, "Trees of Classification and Regression," 1984.

Peter Prettenhofer · Answer 2 · 2013-04-04 19:32

The usual way to calculate function importance values for a single tree is as follows:

you initialize the feature_importances array of all zeros of size n_features .
you cross the tree: for each internal node that is divided into function i , you calculate the error reduction of this node multiplied by the number of samples that were redirected to node and added this value is equal to feature_importances[i] .

Reducing errors depends on the impurity criterion you use (for example, Gini, Entropy, MSE, ...). It is an admixture of many examples that are sent to the inner node minus the sum of the impurities of the two sections created by the split.

It is important that these values relate to a particular data set (both reducing errors and the number of samples is a data set), therefore these values cannot be compared between different data sets.

As far as I know, there are alternative ways to calculate importance value values in decision trees. A brief description of the aforementioned method can be found in The Elements of Statistical Learning by Trevor Hasti, Robert Tibbrani, and Jerome Friedman.

ogrisel · Answer 3 · 2013-04-04 12:22

This is the ratio between the number of samples directed to the node solution, with the participation of this function in any of the ensemble trees according to the total number of samples in the training set.

Capabilities that are involved in the nodes of the upper level of decision trees tend to see more patterns, therefore, are of great importance.

Edit : this description is only partially correct: the answers of Gill and Peter are the correct answer.

Peter · Answer 4 · 2017-01-06 22:25

As @GillesLouppe noted above, scikit-learn currently implements a “mean impurity reduction” metric for characterization characteristics. I personally find the second metric more interesting when you randomly rearrange the values for each of your functions one by one and see how much worse your performance is outside the package.

Since what you encounter with the feature is how much each function contributes to your overall predictive performance model, the second metric actually gives you a direct measure of this, while the “average reduction admixture” is just a good proxy.

If you're interested, I wrote a small package that implements a metric of permutation values and can be used to compute values from an instance of the scikit-learn random lecture class:

https://github.com/pjh2011/rf_perm_feat_import

Change: this works for Python 2.7, not 3

tengfei li · Answer 5 · 2018-08-06 07:59

Let me answer the question. the code:

 iris = datasets.load_iris() X = iris.data y = iris.target clf = DecisionTreeClassifier() clf.fit(X, y)

Decision area:
enter image description here
We can get compute_feature_importance: [0. , 0.01333333.0.06405596,0.92261071]
Check out the source code:

 cpdef compute_feature_importances(self, normalize=True): """Computes the importance of each feature (aka variable).""" cdef Node* left cdef Node* right cdef Node* nodes = self.nodes cdef Node* node = nodes cdef Node* end_node = node + self.node_count cdef double normalizer = 0. cdef np.ndarray[np.float64_t, ndim=1] importances importances = np.zeros((self.n_features,)) cdef DOUBLE_t* importance_data = <DOUBLE_t*>importances.data with nogil: while node != end_node: if node.left_child != _TREE_LEAF: # ... and node.right_child != _TREE_LEAF: left = &nodes[node.left_child] right = &nodes[node.right_child] importance_data[node.feature] += ( node.weighted_n_node_samples * node.impurity - left.weighted_n_node_samples * left.impurity - right.weighted_n_node_samples * right.impurity) node += 1 importances /= nodes[0].weighted_n_node_samples if normalize: normalizer = np.sum(importances) if normalizer > 0.0: # Avoid dividing by zero (eg, when root is pure) importances /= normalizer return importances

Try to calculate the function value:

 print("sepal length (cm)",0) print("sepal width (cm)",(3*0.444-(0+0))) print("petal length (cm)",(54* 0.168 - (48*0.041+6*0.444)) +(46*0.043 -(0+3*0.444)) + (3*0.444-(0+0))) print("petal width (cm)",(150* 0.667 - (0+100*0.5)) +(100*0.5-(54*0.168+46*0.043))+(6*0.444 -(0+3*0.444)) + (48*0.041-(0+0)))

We get feature_importance: np.array ([0,1,332,6.418,92.30]).
After normalization, we can get an array ([0., 0.01331334, 0.06414793, 0.92253873]), this is the same as clf.feature_importances_ .
Be careful, all classes must have a weight of one.

Makan · Answer 6 · 2018-08-13 19:40

For those looking for a link to scikit-learn documentation on this topic or a link to an answer from @GillesLouppe:

In RandomForestClassifier, the estimators_ attribute is a list of DecisionTreeClassifier (as indicated in the documentation ). To calculate the feature_importances_ for a RandomForestClassifier, in the scikit-learn source code , it averages over all the attributes (all DecisionTreeClassifer) of the feature_importances_ attributes in the ensemble.

The DecisionTreeClassifer documentation mentions that “the importance of a function is calculated as the (normalized) total decrease in the criterion brought by this feature. It is also known as the value of Gini [1].”

Here is a direct link for more information on the variable and the importance of Gini, as indicated in the scikit-learn link below.

[1] L. Breiman and A. Cutler, Random Forests, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

How are feature_importances values ​​defined in RandomForestClassifier?

More articles:

How are feature_importances values defined in RandomForestClassifier?