I am trying to use weighted samples in scikit-learn while training in the Random Forest classifier. It works well when I pass sample weights directly to the classifier, for example. RandomForestClassifier().fit(X,y,sample_weight=weights) , but when I tried grid search to find the best hyperparameters for the classifier, I hit the wall:
To transfer weight using the grid option, the following is used:
grid_search = GridSearchCV(RandomForestClassifier(), params, n_jobs=-1, fit_params={"sample_weight"=weights})
The problem is that the cross-validator is not aware of the sample weights and therefore does not overfulfill them with the actual data, so the grid_search.fit(X,y) call is not executed: the cross-validator creates subsets of X and y, sub_X and sub_y, and eventually the classifier is called using classifier.fit(sub_X, sub_y, sample_weight=weights) , but now the scales have not been re-sampled, so an exception is thrown.
At the moment, I have been working on the problem using high-end sampling samples before training the classifier, but this is temporary work. Any suggestions on how to proceed?
python scikit-learn machine-learning
Roee shenberg
source share