GradientBoostingClassifier with BaseEstimator in scikit-learn?

Question

GradientBoostingClassifier with BaseEstimator in scikit-learn?

I tried using GradientBoostingClassifier in scikit-learn and it works fine with default options. However, when I tried to replace BaseEstimator with another classifier, it did not work and gave me the following error:

return y - np.nan_to_num(np.exp(pred[:, k] - IndexError: too many indices

You have a solution to the problem.

This error can be repaired using the following fragments:

 import numpy as np from sklearn import datasets from sklearn.ensemble import GradientBoostingClassifier from sklearn.linear_model import LogisticRegression from sklearn.utils import shuffle mnist = datasets.fetch_mldata('MNIST original') X, y = shuffle(mnist.data, mnist.target, random_state=13) X = X.astype(np.float32) offset = int(X.shape[0] * 0.01) X_train, y_train = X[:offset], y[:offset] X_test, y_test = X[offset:], y[offset:] ### works fine when init is None clf_init = None print 'Train with clf_init = None' clf = GradientBoostingClassifier( (loss='deviance', learning_rate=0.1, n_estimators=5, subsample=0.3, min_samples_split=2, min_samples_leaf=1, max_depth=3, init=clf_init, random_state=None, max_features=None, verbose=2, learn_rate=None) clf.fit(X_train, y_train) print 'Train with clf_init = None is done :-)' print 'Train LogisticRegression()' clf_init = LogisticRegression(); clf_init.fit(X_train, y_train); print 'Train LogisticRegression() is done' print 'Train with clf_init = LogisticRegression()' clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=5, subsample=0.3, min_samples_split=2, min_samples_leaf=1, max_depth=3, init=clf_init, random_state=None, max_features=None, verbose=2, learn_rate=None) clf.fit(X_train, y_train) # <------ ERROR!!!! print 'Train with clf_init = LogisticRegression() is done'

The following is the full error response:

 Traceback (most recent call last): File "/home/mohsena/Dropbox/programing/gbm/gb_with_init.py", line 56, in <module> clf.fit(X_train, y_train) File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 862, in fit return super(GradientBoostingClassifier, self).fit(X, y) File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 614, in fit random_state) File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 475, in _fit_stage residual = loss.negative_gradient(y, y_pred, k=k) File "/usr/local/lib/python2.7/dist-packages/sklearn/ensemble/gradient_boosting.py", line 404, in negative_gradient return y - np.nan_to_num(np.exp(pred[:, k] - IndexError: too many indices

+7

python numpy scikit-learn machine-learning ensemble-learning

iampat Jul 03 '13 at 17:07

source share

4 answers

An improved version of iampat answer and a small modification of scikit-developers answer should do the trick.

 class init: def __init__(self, est): self.est = est def predict(self, X): return self.est.predict_proba(X)[:,1][:,numpy.newaxis] def fit(self, X, y): self.est.fit(X, y)

+8

Santosh Oct 30 '13 at 10:34

source share

Here is the complete and, in my opinion, simpler version of the iampat code snippet.

  class RandomForestClassifier_compability(RandomForestClassifier): def predict(self, X): return self.predict_proba(X)[:, 1][:,numpy.newaxis] base_estimator = RandomForestClassifier_compability() classifier = GradientBoostingClassifier(init=base_estimator)

+4

Framester May 28 '15 at 12:43

source share

Gradient amplification usually requires the base student to be an algorithm that performs numerical prediction, not classification. I guess this is your problem.

+3

Raff.edward Jul 03 '13 at 17:43

source share

iampat · Accepted Answer · 2013-07-04T10:39:29+0000

As the developers of scikit-learn suggested, the problem can be solved using such an adapter:

 def __init__(self, est): self.est = est def predict(self, X): return self.est.predict_proba(X)[:, 1] def fit(self, X, y): self.est.fit(X, y)

GradientBoostingClassifier with BaseEstimator in scikit-learn?

More articles: