Sklearn Kfold gets a single addition instead of a loop

After using cross_validation.KFold (n, n_folds = folds), I would like to access the indices for training and testing a single fold, instead of going through all the folds.

So let's take an example code:

from sklearn import cross_validation X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([1, 2, 3, 4]) kf = cross_validation.KFold(4, n_folds=2) >>> print(kf) sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False, random_state=None) >>> for train_index, test_index in kf: 

I would like to access the first time in kf like this (instead of a loop):

 train_index, test_index in kf[0] 

This should only return the first fold, but instead I get an error: "TypeError: object" KFold "does not support indexing"

What I want as output:

 >>> train_index, test_index in kf[0] >>> print("TRAIN:", train_index, "TEST:", test_index) TRAIN: [2 3] TEST: [0 1] 

Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html

Question

How to get indices for training and test for only one fold without going through the entire cycle of the cycle?

+11
python scikit-learn cross-validation
source share
2 answers

You are on the right track. All you need to do now is:

 kf = cross_validation.KFold(4, n_folds=2) mylist = list(kf) train, test = mylist[0] 

kf actually kf is a generator that does not calculate the breakdown of a test train until it is needed. This improves memory usage since you do not store unnecessary items. Creating a list of the KFold object forces it to make all values ​​available.

Here are two great SO questions that explain what generators are: one and two


Edit November 2018

The API has changed since sklearn 0.20. Updated example (for py3.6):

 from sklearn.model_selection import KFold import numpy as np kf = KFold(n_splits=4) X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) X_train, X_test = next(kf.split(X)) In [12]: X_train Out[12]: array([2, 3]) In [13]: X_test Out[13]: array([0, 1]) 
+19
source share
 # We saved all the K Fold samples in different list then we access to this throught [i] from sklearn.model_selection import KFold import numpy as np import pandas as pd kf = KFold(n_splits=4) X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) Y = np.array([0,0,0,1]) Y=Y.reshape(4,1) X=pd.DataFrame(X) Y=pd.DataFrame(Y) X_train_base=[] X_test_base=[] Y_train_base=[] Y_test_base=[] for train_index, test_index in kf.split(X): X_train, X_test = X.iloc[train_index,:], X.iloc[test_index,:] Y_train, Y_test = Y.iloc[train_index,:], Y.iloc[test_index,:] X_train_base.append(X_train) X_test_base.append(X_test) Y_train_base.append(Y_train) Y_test_base.append(Y_test) print(X_train_base[0]) print(Y_train_base[0]) print(X_train_base[1]) print(Y_train_base[1]) 
0
source share

All Articles