Recognizing a Listed Image Using OpenCV SIFT Using FLANN Compliance

The application point is image recognition from an already defined list of images. SIFT descriptors were recorded in the image list and stored in files. Nothing interesting here:

std::vector<cv::KeyPoint> detectedKeypoints; cv::Mat objectDescriptors; // Extract data cv::SIFT sift; sift.detect(image, detectedKeypoints); sift.compute(image, detectedKeypoints, objectDescriptors); // Save the file cv::FileStorage fs(file, cv::FileStorage::WRITE); fs << "descriptors" << objectDescriptors; fs << "keypoints" << detectedKeypoints; fs.release(); 

Then the device takes a picture. SIFT descriptors are retrieved in the same way. Now the idea was to compare file descriptors. I do this using the FLANN marker from OpenCV. I am trying to quantify the similarity, image by image. After going through the whole list, I should have the best match.

 const cv::Ptr<cv::flann::IndexParams>& indexParams = new cv::flann::KDTreeIndexParams(1); const cv::Ptr<cv::flann::SearchParams>& searchParams = new cv::flann::SearchParams(64); // Match using Flann cv::Mat indexMat; cv::FlannBasedMatcher matcher(indexParams, searchParams); std::vector< cv::DMatch > matches; matcher.match(objectDescriptors, readDescriptors, matches); 

After matching, I understand that I get a list of the closest distances found between the feature vectors. I find the minimum distance and, using it, I can count "good matches" and even get a list of corresponding points:

 // Count the number of mathes where the distance is less than 2 * min_dist int goodCount = 0; for (int i = 0; i < objectDescriptors.rows; i++) { if (matches[i].distance < 2 * min_dist) { ++goodCount; // Save the points for the homography calculation obj.push_back(detectedKeypoints[matches[i].queryIdx].pt); scene.push_back(readKeypoints[matches[i].trainIdx].pt); } } 

I am showing simple pieces of code to make it easier to follow, I know that some of them do not have to be here.

Continuing, I was hoping that just counting the number of good matches like this would be enough, but it turned out that basically I just pointed to the image with the most descriptors. What I tried after this is a homography calculation. The goal was to figure it out and see if it was really homophafia or not. The hope was that a good match and only a good match would have homography, which is a good transformation. Creating homography was done simply using cv :: findHomography on obj and the scene, which are std :: vector <summary :: Point2f>. I checked the validity of the homography using the code I found online:

 bool niceHomography(cv::Mat H) { std::cout << H << std::endl; const double det = H.at<double>(0, 0) * H.at<double>(1, 1) - H.at<double>(1, 0) * H.at<double>(0, 1); if (det < 0) { std::cout << "Homography: bad determinant" << std::endl; return false; } const double N1 = sqrt(H.at<double>(0, 0) * H.at<double>(0, 0) + H.at<double>(1, 0) * H.at<double>(1, 0)); if (N1 > 4 || N1 < 0.1) { std::cout << "Homography: bad first column" << std::endl; return false; } const double N2 = sqrt(H.at<double>(0, 1) * H.at<double>(0, 1) + H.at<double>(1, 1) * H.at<double>(1, 1)); if (N2 > 4 || N2 < 0.1) { std::cout << "Homography: bad second column" << std::endl; return false; } const double N3 = sqrt(H.at<double>(2, 0) * H.at<double>(2, 0) + H.at<double>(2, 1) * H.at<double>(2, 1)); if (N3 > 0.002) { std::cout << "Homography: bad third row" << std::endl; return false; } return true; } 

I don’t understand the math behind this, so when checking, I sometimes replaced this function with a simple check if the determinant of homography was positive. The problem is that I had problems. The homographies were either bad or good when they shouldn't (when I checked only the determinant).

I decided that I should use homography, and for several points just calculate their position in the target image using their position in the original image. Then I would compare these average distances, and ideally I would get a very obvious lower average distance if the image is correct. This does not work at all . All distances were colossal. I thought I might have used homography to calculate the correct position, but switching obj and scenes with each other gave similar results.

Other things I tried were SURF descriptors instead of SIFT, BFMatcher (brute force) instead of FLANN, getting n smallest distances for each image instead of a number depending on the minimum distance or getting distances depending on the global maximum distance, None of these approaches didn't give me certain good results, and I feel stuck right now.

My next next strategy would be to sharpen the images or even turn them into binary images using some kind of local threshold or some algorithms used for segmentation. I am looking for any suggestions or mistakes that everyone can see in my work.

I do not know how relevant this is, but I have added some of the images on which I am testing this. Many times on test images, most SIFT vectors come from a frame (higher contrast) than the picture. That's why I think sharpening images can work, but I don't want to go deeper if something that I did before is wrong.

Image gallery here with descriptions in the titles. Images have a fairly high resolution, see if this can give some clues.

+6
source share
2 answers

You can try to test if, when matching, the lines between the source image and the target image are relatively parallel. If this is not a correct match, then you will have a lot of noise and the lines will not be parallel.

See the attached image, which shows the correct match (using SURF and BF) - all the lines are mostly parallel (although I should point out that this is a simple example).

enter image description here

+1
source

You are going right.

First, use the second closest isntead ratio of your "2 * min_dist good match" fooobar.com/questions/814398 / ....

Secondly, use homography in a different way. When you find homography, you have not only the matrix H, but also the number of matches corresponding to it. Make sure this is a reasonable number, say> = 15. If less than the object does not match.

Thirdly, if you have a big change in point of view, SIFT or SURF cannot match the images. Try using MODS instead ( http://cmp.felk.cvut.cz/wbs/ here are the Windows and Linux binaries, as well as a document describing the algorithm) or ASIFT (much slower and much worse, but open source ) http://www.ipol.im/pub/art/2011/my-asift/

Or at least use the MSER or Hessian-Affine detector instead of SIFT (keeping SIFT as a descriptor).

0
source

All Articles