How to change face recognition example using scikit-learn

I am trying to adapt scikit-learn face recognition based on a script script for use on my own image dataset (note that this script works fine on my Python 3, sklearn 0.17).

The next call to fetch_lfw_people() is something that probably needs to be modified, and I tried my best to let the script skip this, instead to my own image folders.

I would like the script - instead of extracting data from the downloaded folders - to get images from my own dataset located in '/User/pepe/images/' .

 # Download the data, if not already on disk and load it as numpy arrays lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4) # introspect the images arrays to find the shapes (for plotting) n_samples, h, w = lfw_people.images.shape # for machine learning we use the 2 data directly (as relative pixel # positions info is ignored by this model) X = lfw_people.data n_features = X.shape[1] # the label to predict is the id of the person y = lfw_people.target target_names = lfw_people.target_names n_classes = target_names.shape[0] etc... 

Do you have any suggestions to fix this?

As you can see from the GitHub code, the central part is not actually fetch_lfw_people() itself, but an lfw.py file that has additional features.

+5
source share
2 answers

You do not need to β€œchange” anything, the function provides an easy way to do this.

Cm:

https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/datasets/lfw.py#L229

Parameters data_home (specify your path here!) And download_if_missing (disable it, that is, specify a False value for it), just for this purpose!

+2
source

I could change it to the following code, but I cannot calculate the score. I can read images and also compare them with a sample. I do not know how to use the counter function.

 from time import time import numpy, os from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.cross_validation import train_test_split from sklearn.grid_search import GridSearchCV from sklearn.decomposition import RandomizedPCA from sklearn.svm import SVC from PIL import Image #Path to the root image directory containing sub-directories of images path="<Path to Folder of Training Images>" testImage = "<Path to test image>" #Flat image Feature Vector X=[] #Int array of Label Vector Y=[] n_sample = 0 #Total number of Images h = 750 #Height of image in float w = 250 #Width of image in float n_features = 187500 #Length of feature vector target_names = [] #Array to store the names of the persons label_count = 0 n_classes = 0 for directory in os.listdir(path): for file in os.listdir(path+directory): print(path+directory+"/"+file) img=Image.open(path+directory+"/"+file) featurevector=numpy.array(img).flatten() print len(featurevector) X.append(featurevector) Y.append(label_count) n_sample = n_sample + 1 target_names.append(directory) label_count=label_count+1 print Y print target_names n_classes = len(target_names) ############################################################################### # Split into a training set and a test set using a stratified k fold # split into a training and teststing set X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size=0.25, random_state=42) ############################################################################### # Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled # dataset): unsupervised feature extraction / dimensionality reduction n_components = 10 print("Extracting the top %d eigenfaces from %d faces" % (n_components, len(X_test))) t0 = time() pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train) print("done in %0.3fs" % (time() - t0)) eigenfaces = pca.components_.reshape((n_components, h, w)) print("Projecting the input data on the eigenfaces orthonormal basis") t0 = time() X_train_pca = pca.transform(X_train) X_test_pca = pca.transform(X_test) print("done in %0.3fs" % (time() - t0)) ############################################################################### # Train a SVM classification model print("Fitting the classifier to the training set") t0 = time() param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5], 'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], } clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid) clf = clf.fit(X_train_pca, y_train) print("done in %0.3fs" % (time() - t0)) print("Best estimator found by grid search:") print(clf.best_estimator_) ############################################################################### # Quantitative evaluation of the model quality on the test set print("Predicting people names on the test set") t0 = time() y_pred = clf.predict(X_test_pca) print clf.score(X_test_pca,y_test) print("done in %0.3fs" % (time() - t0)) print(classification_report(y_test, y_pred, target_names=target_names)) print(confusion_matrix(y_test, y_pred, labels=range(n_classes))) ############################################################################### # Prediction of user based on the model test = [] testImage=Image.open(testImage) testImageFeatureVector=numpy.array(testImage).flatten() test.append(testImageFeatureVector) testImagePCA = pca.transform(test) testImagePredict=clf.predict(testImagePCA) #print clf.score(testImagePCA) #print clf.score(X_train_pca,testImagePCA) #print clf.best_params_ #print clf.best_score_ #print testImagePredict print target_names[testImagePredict[0]] 
+1
source

All Articles