As an example of cross-validation without preprocessing, I can do something like this:
tuned_params = [{"penalty" : ["l2", "l1"]}] from sklearn.linear_model import SGDClassifier SGD = SGDClassifier() from sklearn.grid_search import GridSearchCV clf = GridSearchCV(myClassifier, params, verbose=5) clf.fit(x_train, y_train)
I would like to pre-process my data using something like
from sklearn import preprocessing x_scaled = preprocessing.scale(x_train)
But it would be nice to do this before setting up cross-validation, because then the training and test sets will be normalized together. How to configure cross-validation to preprocess the corresponding training and testing sets separately for each run?
python scikit-learn
Feish
source share