Here is an alternative solution using mahotas and milk .
- Start by creating two directories:
positives/ and negatives/ , where you manually select a few examples. - I assume the rest of the data is in the
unlabeled/ directory - Calculate functions for all images in positives and negatives
- recognize the classifier
- use this classifier on unlabeled images
In the code below, I used jug to give you the ability to run it on multiple processors, but the code also works if you delete every line that mentions TaskGenerator
from glob import glob import mahotas import mahotas.features import milk from jug import TaskGenerator @TaskGenerator def features_for(imname): img = mahotas.imread(imname) return mahotas.features.haralick(img).mean(0) @TaskGenerator def learn_model(features, labels): learner = milk.defaultclassifier() return learner.train(features, labels) @TaskGenerator def classify(model, features): return model.apply(features) positives = glob('positives/*.jpg') negatives = glob('negatives/*.jpg') unlabeled = glob('unlabeled/*.jpg') features = map(features_for, negatives + positives) labels = [0] * len(negatives) + [1] * len(positives) model = learn_model(features, labels) labeled = [classify(model, features_for(u)) for u in unlabeled]
This uses texture functions that are probably good enough, but you can play with other functions in mahotas.features if you want (or try mahotas.surf , but that gets more complicated). In general, it was difficult for me to perform the classification with the hard thresholds you are looking for if the scan is very controlled.
source share