There are two problems here:
1) A warning about the absence of a label 2) You get all 0 for forecasts
A warning means that some of your classes are missing from the training data. This is a common problem. If you have 400 classes, then some of them should be very rare, and with any data splitting, some classes may be missing on one side of the split. There may also be classes that simply do not appear in your data. You can try Y.sum(axis=0).all() , and if it's False, then some classes do not even appear in Y. All this sounds horrible, but realistic, you cannot correctly predict the classes that occur 0, 1 or any very small number of times, so predicting 0 for them is probably about the best you can do.
As for all-0 predictions, I will point out that with 400 classes, probably all of your classes are encountered much less than half the time. You can check Y.mean(axis=0).max() to get the maximum frequency of shortcuts. With 400 classes, this can be only a few percent. If so, the binary classifier, which should make a forecast of 0-1 for each class, will probably choose 0 for all classes in all instances. This is not really a mistake, simply because all class frequencies are low.
If you know that each instance has a positive label (at least one), you can get the solution values โโ( clf.decision_function ) and select the class with the highest value for each instance. However, you will have to write code.
I once had a top 10 in a Kaggle contest that looked like this one. This was a multigrid problem with ~ 200 classes, none of which even had a 10% frequency, and we need forecasts 0-1. In this case, I got the values โโof the solution and took the highest, plus everything that was above the threshold. I chose the threshold that worked best on the hold set. The code for this entry is on Github: Kaggle Greek Media code . You can take a look at it.
If you did it like that, thanks for reading. Hope this helps.
Dthal
source share