10 * 10x cross-validation in scikit-learn?

Question

10 * 10x cross-validation in scikit-learn?

Is an

class sklearn.cross_validation.ShuffleSplit(
    n, 
    n_iterations=10, 
    test_fraction=0.10000000000000001, 
    indices=True, 
    random_state=None
)

right way for 10 * 10x CV in scikit-learn? (Changing random_state to 10 different numbers)

Because I did not find the parameter random_statein Stratified K-Foldor K-Fold, and the separate one is K-Foldalways the same for the same data.

If ShuffleSplitcorrect, one problem is that it is mentioned

Note: contrary to other cross-validation strategies, random splitting is not that all folds will be different, although this is still very likely for large datasets

Is this always the case for CV 10 * 10 times?

+5

python scikit-learn machine-learning scikits

Flake Nov 26 '11 at 19:36

source share

1

ogrisel · Accepted Answer · 2011-11-26T20:05:39+0000

, 10 * 10 . ShuffleSplit, , 10 . 10 , 100 10% , , :

>>> ss = ShuffleSplit(X.shape[0], n_iterations=100, test_fraction=0.1,
...     random_state=42)

10 StratifiedKFold k = 10, ( 100 90% - /10% ):

>>> from sklearn.utils import shuffle
>>> from sklearn.cross_validation import StratifiedKFold, cross_val_score
>>> for i in range(10):
...    X, y = shuffle(X_orig, y_orig, random_state=i)
...    skf = StratifiedKFold(y, 10)
...    print cross_val_score(clf, X, y, cv=skf)

10 * 10x cross-validation in scikit-learn?

More articles: