How to implement a common image classifier using SIFT and SVM

I want to train my svm classifier for categorizing images using scikit-learn. And I want to use opencv-python SIFT function to extract the image function. The situation is as follows:

1. that the input of the scikit-learn svm classifier is the 2nd array, which means that each line represents one image, and the number of elements in each image is the same: here
2. opencv-python. The SIFT algorithm returns a list of key points, which is an array of numpy forms enter image description here . here
So my question is:
How can I handle the SIFT functions to match the input of the SVM classifier? Can you help me?

update1 :

Thanks for the pyan advice, I adapt my proposal as follows: 1. Get the SIFT object vectors from each image
2. perform k-average clustering across all vectors 3. Create a function dictionary, aka cookbook based on the cluster center
4. Repeat the presentation of each image based on the dictionary of functions, of course, the size size of each image will be the same
5. Train and rate my SVM classifier

Update2:

I collected all the vectors of SIFT image objects into an array (x * 128), which is so large, and then I need to perform clustering on it.
The problem is in:
If I use k-means, the cluster parameter number must be set, and I do not know how to set the optimal value; if i don't use k-tool, what algorithm might be suitable for this?
note:I want to use scikit-learn to perform clustering

My suggestion:
1. execute a dbscan cluster on vectors, then I can get label_size and labels.
2. because dbscan in scikit-learn cannot be used for forecasting, I could train a new classifier A based on the result of dbscan; 3. Classifier A is like a cookbook, I can mark all the images of SIFT vectors. After that, each image can be re-rendered,
4. Based on the above work, I can train my last classifier B.
note:for predict a new image, its SIFT vectors must be transform by classifier A into the vector as classifier B input

Can you give me some advice?

+5
source share
1 answer

Image classification can be quite general. To identify good features, you first need to clearly indicate what type of output you want. For example, images can be classified according to the scenes in them in the form of nature, city views, indoors, etc. Different types of classifications may require different functions.

The general approach used in computer vision for classifying images based on keywords is a bag of words (interfering functions) or studying a dictionary. You can find a literary search to familiarize yourself with this topic. In your case, the main idea would be to group the SIFT functions into different clusters. Instead of directly feeding scikit-learn using SIFT functions, specify the vector of a group of elements as an input signal. Therefore, each image will be represented by a one-dimensional vector.

A Brief Introduction from Wikipedia The Summary Word Model in Computer Vision

+4
source

All Articles