It depends on what you define with "duplicate".
If you are looking for absolutely identical copies (copy-paste), the game is simple. The approach proposed by Safir, with several performance improvements, is good.
If you want to find near-exact duplicates, work suddenly becomes incredibly difficult. Check out this image check for similarities to OpenCV for more information.
Now, back to the βsimpleβ approach, it depends on how many pictures you need to compare. Since comparing each image with everyone else in a folder with 1000 images gives you 1,000,000 images and comparisons. (Since you cannot store them all in RAM at once, you will have to load and unload them a million times). This is too much for even a powerful desktop processor.
An easy way would be to use a hash function (like sha2) for each image, and then compare only the hashes. A good ad-hoc "hash" for images can be a histogram (although for positive results you can double check with memcmp).
And even if you try the brute force approach (comparing each pixel of the image with another), a faster way is to use memcmp () instead of accessing the pixels of the image pixel by pixel.
Sam
source share