Can I make partial graphs for DecisionTreeClassifier in scikit-learn (and R)

I have old code using scikit-learn DecisionTreeClassifier. I would like to make partial graphs based on this classifier.

All the examples that I have seen so far (for example, http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.partial_dependence.plot_partial_dependence.html ) use the "GradientBoostingRegressor" classifier.

My question is, is it possible to make partial graphs for another classifier? (eg.DecisionTreeClassifier). I tried the following code:

from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble.partial_dependence import plot_partial_dependence from sklearn.datasets import make_friedman1 X, y = make_friedman1() clf = DecisionTreeClassifier(max_features='auto').fit(X,y) fig, axs = plot_partial_dependence(clf, X, [0, (0, 1)]) 

and it does not work.

 ValueError: gbrt has to be an instance of BaseGradientBoosting 

I found some comments on the Internet (Quora):

Partial dependence graphs are generally independent of the particular choice of classifier. The plot partial dependency module used as an example of increasing the gradient will work fine if you change to a random forest classifier.

However, I still don't know how this works.

Also for R, I can make partial graphs for the randomForest package. However, I’m not quite sure how random forest is implemented, in the manual R author Andy Love refers to the link "Friedman, J. (2001). Approximation of greed: Gradient Enhancer, Ann. Of Stat."

Does this mean that I have to use gradient acceleration to get partial graphs?

Any help is appreciated. Many thanks!

+8
python scikit-learn r
source share
3 answers

As indicated in the error message, you should use the classifier with the base class BaseGradientBoosting .

From the documentation you sent:

gbrt: BaseGradientBoosting

Suitable gradient enhancement model

Both GradientBoostingClassifier and GradientBoostingRegressor inherit from BaseGradientBoosting ( source ), so either of these classes should work theoretically. As for the rest of these classifiers, they do not seem to be supported by the plot_partial_dependence function.

+2
source share

A partial dependency graph is a method for detecting the interaction of an object with a target in a particular model. You can think of it as a coefficient (beta) in a linear regression model, in a nonlinear model you can use only a partial dependence and tell how the model interprets an element for each different value of the element. And another difference is that the interaction of functions will play a much larger role in most non-linear models. The fact that scikit-learn does not support the part_dependence graph for a model other than gbrt does not mean that you cannot apply the method to the model. (which is rather sad, I have been waiting for their update for quite some time).

Here is an example of using partial dependency on an Xgb model in Python. https://xiaoxiaowang87.imtqy.com/monotonicity_constraint/

And this post also illustrates how it is calculated. https://medium.com/usf-msds/intuitive-interpretation-of-random-forest-2238687cae45 So you can use this logic to create your own function.

In addition, the R libraries have a partial dependence for the RandomForest model (I have not used R for some time, I can’t remember the exact name of the library).

0
source share

This has bothered me for a long time - PDPs are only available for gbrt classifiers.

Fortunately, this was resolved, and the new version of sklearn (3 weeks ago or so) means that you can use PDP on all classifiers!

The efforts are described here https://github.com/scikit-learn/scikit-learn/pull/12599 (I have nothing to do with this, just a grateful end user)

0
source share

All Articles