ValueError: this solver needs samples from at least 2 classes in the data, but the data contains only one class: 1.0

I have a training data set of 8670 samples, and each test has a length of 125 temporary samples, while my test set consists of 578 tests. When I apply the SVM algorithm from scikit-learn, I get pretty good results.

However, when I apply logistic regression, this error occurs:

"ValueError: this solver needs samples from at least 2 classes in the data, but the data contains only one class: 1.0."

My question is why SVM can give predictions, but logistic regression gives this error?

Is it possible that something is wrong in the data set, or simply that the logistic regression was not able to classify because the training patterns look like it?

+6
source share
1 answer

I read this in the next release of a similar linear module: https://github.com/lensacom/sparkit-learn/issues/49

"Unfortunately, this is a mistake. Sparkit sequentially runs linear sklearn models, then averages them at the reduction stage. There is at least one block that contains only one of the labels. To check, try the following:

train_Z[:, 'y']._rdd.map(lambda x: np.unique(x).size).filter(lambda x: x < 2).count() 

To solve. You can randomize train data to avoid single-label blocks, but this is still waiting for a smart decision.

EDIT: I found a solution, the above error analysis was correct. That would be a solution.

To shuffle arrays in the same order, I used the scikitlearn utils module:

 from sklearn.utils import shuffle X_shuf, Y_shuf = shuffle(X_transformed, Y) 

Then use these shuffled arrays to prepare your model again and it will work!

+8
source

All Articles