Cascading classifiers for multiclass tasks in scikit-learn

Question

Cascading classifiers for multiclass tasks in scikit-learn

Let's say I have a classification problem that is multiclass and characterized by a hierarchical structure, for example. "edible", "nutritious" and "nutritious" - therefore it can be represented in this way

├── edible │ ├── nutritious │ └── ~nutritious └── ~edible

While you can get reasonable performance with classifiers that support multiclass classification or use one-vs-one / all schemes for those that don’t, it can also be useful to train classifiers at each level separately and combine them so that instances classified as "edible" can be classified as nutritious or not.

I would like to use scikit-lean grades as building blocks, and I wonder if I can make Pipeline support this one, or I will need to write my own BaseEnsemble that implements a basic BaseEnsemble and possibly BaseEnsemble to do this.

@Ogrisel was previously mentioned on the http://sourceforge.net/mailarchive/message.php?msg_id=31417048 mailing list, and I'm wondering if anyone has any ideas or suggestions on how to do this.

+6

python scikit-learn machine-learning data-mining

tiao Jan 16 '14 at 0:44

source share

1 answer

ogrisel · Accepted Answer · 2014-01-16T10:20:37+0000

You can write your own class as a meta-estimate by providing a base_estimator and a list of ordered lists of target classes for cascading as a constructor parameter. In the method of fitting this metaclassifier, you superimpose this data on these classes and set the clones for base_estimators for each level and save the received subclassifiers in the metaclassifier attribute.

In the prediction method, you repeat the cascading structure again and this time you call the forecast in the base subclass to cut your forecasts and transfer them to the next level recursively. You will need a fair amount of numancy fancy indexing;)

You can git grep base_estimator in the source code to find an existing example of meta-estimates in the code base (e.g. Bagging, AdaBoost, GridSearchCV ...).

Cascading classifiers for multiclass tasks in scikit-learn

More articles: