Adaptation of a binary stack to a multiclass

I studied this styling example . In this case, each set of K-folds creates one column of data, and this is repeated for each classifier. Ie: matrices for mixing:

dataset_blend_train = np.zeros((X.shape[0], len(clfs))) dataset_blend_test = np.zeros((X_submission.shape[0], len(clfs))) 

I need to stack forecasts from a multiclass problem (15 classes per sample). This will create an n * 15 matrix for each clf.

Should these matrices simply concatenate horizontally? Or should they be combined in some other way before applying logistic regression? Thank you

+5
source share
2 answers

You can adapt the code to a problem with several classes in two ways:

  • Concatenate horizontally the probabilities, that is, you will need to create: dataset_blend_train = np.zeros((X.shape[0], len(clfs)*numOfClasses)) dataset_blend_test = np.zeros((X_submission.shape[0], len(clfs)*numOfClasses))
  • Instead of using probabilities, use class prediction for base models. This way you keep arrays of the same size, but instead of predict_proba you just use predict .

I used both options successfully, but which works best may depend on the data set.

+5
source

There is also the problem of expanding functions when passing through each classifier. I am using the following:

 db_train = np.zeros((X_train.shape[0], np.unique(y).shape[0])) db_test = clf.predict_proba(X_test) ... try: dataset_blend_train except NameError: dataset_blend_train = db_train else: dataset_blend_train = np.hstack((dataset_blend_train, db_train)) try: dataset_blend_test except NameError: dataset_blend_test = db_test else: dataset_blend_test = np.hstack((dataset_blend_test, db_test)) 
0
source

All Articles