Numpy: How can I select specific indexes in an np array to check k-fold cross-reference?

Question

Numpy: How can I select specific indexes in an np array to check k-fold cross-reference?

I have a 5000 x 3027 matrix training dataset (CIFAR-10 dataset). Using array_split in numpy, I split it into 5 different parts, and I want to select only one of the parts as a cross validation cross. However, my problem arises when I use something like XTrain [[Indexes]], where the indices are an array, such as [0,1,2,3], because it gives me a 3D tensor of dimensions 4 x 1000 x 3027 , not a matrix. How to collapse "4 x 1000" into 4000 lines to get a 4000 x 3027 matrix?

for fold in range(len(X_train_folds)): indexes = np.delete(np.arange(len(X_train_folds)), fold) XTrain = X_train_folds[indexes] X_cv = X_train_folds[fold] yTrain = y_train_folds[indexes] y_cv = y_train_folds[fold] classifier.train(XTrain, yTrain) dists = classifier.compute_distances_no_loops(X_cv) y_test_pred = classifier.predict_labels(dists, k) num_correct = np.sum(y_test_pred == y_test) accuracy = float(num_correct/num_test) k_to_accuracy[k] = accuracy

+5

python arrays numpy machine-learning cross-validation

kwotsin May 22, '16 at 3:56

source share

2 answers

Perhaps you can try this instead (new to numpy, so if I do something inefficient / wrong, we will be happy to fix it)

 X_train_folds = np.array_split(X_train, num_folds) y_train_folds = np.array_split(y_train, num_folds) k_to_accuracies = {} for k in k_choices: k_to_accuracies[k] = [] for i in range(num_folds): training_data, test_data = np.concatenate(X_train_folds[:i] + X_train_folds[i+1:]), X_train_folds[i] training_labels, test_labels = np.concatenate(y_train_folds[:i] + y_train_folds[i+1:]), y_train_folds[i] classifier.train(training_data, training_labels) predicted_labels = classifier.predict(test_data, k) k_to_accuracies[k].append(np.sum(predicted_labels == test_labels)/len(test_labels))

+3

Abhas sinha Dec 26 '16 at 21:48

source share

Imanol luengo · Accepted Answer · 2016-05-22T14:47:06+0000

I would suggest using the scikit-learn package. It already comes with many common machine learning tools, such as the K-fold cross-validation generator :

 >>> from sklearn.cross_validation import KFold >>> X = # your data [samples x features] >>> y = # gt labels >>> kf = KFold(X.shape[0], n_folds=5)

And then iterate through kf :

 >>> for train_index, test_index in kf: X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] # do something

The above loop will be executed n_folds times, each time with different training and testing indices.

Numpy: How can I select specific indexes in an np array to check k-fold cross-reference?

More articles: