Embed K-neighbor classifier in scikit-learn with 3 functions per object

Question

Embed K-neighbor classifier in scikit-learn with 3 functions per object

I would like to implement KNeighborsClassifier with the scikit-learn module ( http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html )

I extract the strength, elongation and function of Humoments from my images. How can I prepare this data for training and verification? Should I create a list with three functions [Hm, e, s] for each object that I extracted from my images (out of 1 image there are more objects)?

I'm reading this example ( http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html ):

X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import KNeighborsClassifier neigh = KNeighborsClassifier(n_neighbors=3) neigh.fit(X, y) print(neigh.predict([[1.1]])) print(neigh.predict_proba([[0.9]]))

X and y are two functions?

 samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] from sklearn.neighbors import NearestNeighbors neigh = NearestNeighbors(n_neighbors=1) neigh.fit(samples) print(neigh.kneighbors([1., 1., 1.]))

Why in the first example use X and y and now the pattern?

+6

python scikit-learn machine-learning classification nearest-neighbor

postgres Jan 24 '13 at 16:09

source share

1 answer

greeness · Accepted Answer · 2013-01-24T21:41:05+0000

Your first code segment defines a `1d` data classifier.

X represents feature vectors.

 [0] is the feature vector of the first data example [1] is the feature vector of the second data example .... [[0],[1],[2],[3]] is a list of all data examples, each example has only 1 feature.

y represents labels.

The graph below shows the idea:

Green nodes are data labeled 0
Red nodes are data labeled 1
Gray nodes are data with unknown labels.

  print (neigh.predict ([[[1.1]]))

This asks the classifier to predict the label for x=1.1 .

  print(neigh.predict_proba([[0.9]]))

This asks the classifier to give an estimate of the probability of membership for each label.

Since both gray nodes are closer to green, the outputs below make sense.

  [0] # green label [[ 0.66666667 0.33333333]] # green label has greater probability

The second code segment has good `scikit-learn` instructions:

In the following example, we will build the NeighborsClassifier class from an array representing our data set and ask the closest point to [1,1,1]

  >>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
 >>> from sklearn.neighbors import NearestNeighbors
 >>> neighbors = NearestNeighbors (n_neighbors = 1)
 >>> neigh.fit (samples) 
 NearestNeighbors (algorithm = 'auto', leaf_size = 30, ...)
 >>> print (neigh.kneighbors ([1., 1., 1.])) 
 (array ([[[0.5]]), array ([[2]] ...))

There is no target value, because it is only the NearestNeighbors class, it is not a classifier, therefore labels are not needed.

For your own problem:

Since you need a classifier, you must resort to the KNeighborsClassifier if you want to use the KNN approach. You can create your own vector function X and label y , as shown below:

 X = [ [h1, e1, s1], [h2, e2, s2], ... ] y = [label1, label2, ..., ]

Embed K-neighbor classifier in scikit-learn with 3 functions per object

Your first code segment defines a 1d data classifier.

The second code segment has good scikit-learn instructions:

For your own problem:

More articles:

Your first code segment defines a `1d` data classifier.

The second code segment has good `scikit-learn` instructions: