The application point is image recognition from an already defined list of images. SIFT descriptors were recorded in the image list and stored in files. Nothing interesting here:
std::vector<cv::KeyPoint> detectedKeypoints; cv::Mat objectDescriptors; // Extract data cv::SIFT sift; sift.detect(image, detectedKeypoints); sift.compute(image, detectedKeypoints, objectDescriptors); // Save the file cv::FileStorage fs(file, cv::FileStorage::WRITE); fs << "descriptors" << objectDescriptors; fs << "keypoints" << detectedKeypoints; fs.release();
Then the device takes a picture. SIFT descriptors are retrieved in the same way. Now the idea was to compare file descriptors. I do this using the FLANN marker from OpenCV. I am trying to quantify the similarity, image by image. After going through the whole list, I should have the best match.
const cv::Ptr<cv::flann::IndexParams>& indexParams = new cv::flann::KDTreeIndexParams(1); const cv::Ptr<cv::flann::SearchParams>& searchParams = new cv::flann::SearchParams(64);
After matching, I understand that I get a list of the closest distances found between the feature vectors. I find the minimum distance and, using it, I can count "good matches" and even get a list of corresponding points:
// Count the number of mathes where the distance is less than 2 * min_dist int goodCount = 0; for (int i = 0; i < objectDescriptors.rows; i++) { if (matches[i].distance < 2 * min_dist) { ++goodCount; // Save the points for the homography calculation obj.push_back(detectedKeypoints[matches[i].queryIdx].pt); scene.push_back(readKeypoints[matches[i].trainIdx].pt); } }
I am showing simple pieces of code to make it easier to follow, I know that some of them do not have to be here.
Continuing, I was hoping that just counting the number of good matches like this would be enough, but it turned out that basically I just pointed to the image with the most descriptors. What I tried after this is a homography calculation. The goal was to figure it out and see if it was really homophafia or not. The hope was that a good match and only a good match would have homography, which is a good transformation. Creating homography was done simply using cv :: findHomography on obj and the scene, which are std :: vector <summary :: Point2f>. I checked the validity of the homography using the code I found online:
bool niceHomography(cv::Mat H) { std::cout << H << std::endl; const double det = H.at<double>(0, 0) * H.at<double>(1, 1) - H.at<double>(1, 0) * H.at<double>(0, 1); if (det < 0) { std::cout << "Homography: bad determinant" << std::endl; return false; } const double N1 = sqrt(H.at<double>(0, 0) * H.at<double>(0, 0) + H.at<double>(1, 0) * H.at<double>(1, 0)); if (N1 > 4 || N1 < 0.1) { std::cout << "Homography: bad first column" << std::endl; return false; } const double N2 = sqrt(H.at<double>(0, 1) * H.at<double>(0, 1) + H.at<double>(1, 1) * H.at<double>(1, 1)); if (N2 > 4 || N2 < 0.1) { std::cout << "Homography: bad second column" << std::endl; return false; } const double N3 = sqrt(H.at<double>(2, 0) * H.at<double>(2, 0) + H.at<double>(2, 1) * H.at<double>(2, 1)); if (N3 > 0.002) { std::cout << "Homography: bad third row" << std::endl; return false; } return true; }
I donβt understand the math behind this, so when checking, I sometimes replaced this function with a simple check if the determinant of homography was positive. The problem is that I had problems. The homographies were either bad or good when they shouldn't (when I checked only the determinant).
I decided that I should use homography, and for several points just calculate their position in the target image using their position in the original image. Then I would compare these average distances, and ideally I would get a very obvious lower average distance if the image is correct. This does not work at all . All distances were colossal. I thought I might have used homography to calculate the correct position, but switching obj and scenes with each other gave similar results.
Other things I tried were SURF descriptors instead of SIFT, BFMatcher (brute force) instead of FLANN, getting n smallest distances for each image instead of a number depending on the minimum distance or getting distances depending on the global maximum distance, None of these approaches didn't give me certain good results, and I feel stuck right now.
My next next strategy would be to sharpen the images or even turn them into binary images using some kind of local threshold or some algorithms used for segmentation. I am looking for any suggestions or mistakes that everyone can see in my work.
I do not know how relevant this is, but I have added some of the images on which I am testing this. Many times on test images, most SIFT vectors come from a frame (higher contrast) than the picture. That's why I think sharpening images can work, but I don't want to go deeper if something that I did before is wrong.
Image gallery here with descriptions in the titles. Images have a fairly high resolution, see if this can give some clues.