I use scikit to perform logistic regression on spam / ham data. X_train is my training data and y_train tags ("spam" or "ham"), and I trained my logistic regression as follows:
classifier = LogisticRegression() classifier.fit(X_train, y_train)
If I want to get accuracy for checking cross-references 10 times, I just write:
accuracy = cross_val_score(classifier, X_train, y_train, cv=10)
I thought it was possible to compute prefixes and reminders as well by simply adding one parameter this way:
precision = cross_val_score(classifier, X_train, y_train, cv=10, scoring='precision') recall = cross_val_score(classifier, X_train, y_train, cv=10, scoring='recall')
But this leads to a ValueError :
ValueError: pos_label=1 is not a valid label: array(['ham', 'spam'], dtype='|S4')
Is it related to data (should labels be binarized?) Or do they change the cross_val_score function?
Thank you in advance!
python scikit-learn precision machine-learning logistic-regression
Anil narassiguin
source share