Why does GridSearchCV not give the best result? - Scikit Learn

Question

Why does GridSearchCV not give the best result? - Scikit Learn

I have a dataset with 158 rows and 10 columns. I am trying to build several linear regression models and try to predict the future value.

I used GridSearchCV to configure the settings.

Here is my GridSearchCV and regression function:

def GridSearch(data): X_train, X_test, y_train, y_test = cross_validation.train_test_split(data, ground_truth_data, test_size=0.3, random_state = 0) parameters = {'fit_intercept':[True,False], 'normalize':[True,False], 'copy_X':[True, False]} model = linear_model.LinearRegression() grid = GridSearchCV(model,parameters) grid.fit(X_train, y_train) predictions = grid.predict(X_test) print "Grid best score: ", grid.best_score_ print "Grid score function: ", grid.score(X_test,y_test)

The output of this code is:

Best rated netting: 0.720298870251
Grid Rating Function: 0.888263112299

The question is, what is the difference between best_score_ and score function?

How can the score function be better than the best_score function?

Thanks in advance.

+5

python scikit-learn r machine-learning regression

Batuhan bardak May 25 '15 at 16:16

source share

1 answer

Brenbarn · Accepted Answer · 2015-05-25T16:21:03+0000

best_score_ - The best cross-validation result. That is, the model fits into part of the training data, and the score is calculated by predicting the rest of the training data. This is because you went X_train and y_train to fit ; the fit process, therefore, knows nothing about your test suite, only your training set.

The score method of the model object evaluates the model based on the data you provide. You passed X_test and y_test , so this call calculates an estimate of the appropriate (i.e., Customized) model on the test case.

In short, two indicators are calculated on different data sets, so it should not be surprising that they are different.

Why does GridSearchCV not give the best result? - Scikit Learn

More articles: