No, the best selection of a subset is not implemented. The easiest way to do this is to write it yourself. This should start:
from itertools import chain, combinations from sklearn.cross_validation import cross_val_score def best_subset_cv(estimator, X, y, cv=3): n_features = X.shape[1] subsets = chain.from_iterable(combinations(xrange(k), k + 1) for k in xrange(n_features)) best_score = -np.inf best_subset = None for subset in subsets: score = cross_val_score(estimator, X[:, subset], y, cv=cv).mean() if score > best_score: best_score, best_subset = score, subset return best_subset, best_score
This performs k-fold cross-validity inside the loop, so it will correspond to k 2 estimates when transferring data with functions p.
source share