Comparing image in url with image in file system in python

Question

Comparing image in url with image in file system in python

Is there a quick and easy way to make such a comparison?

I found several questions comparing images from stackoverflow, but none of them actually answered this question.

I have image files in my file system and script that extract images from urls. I want to check if the image in the url is the same as on the disk. Usually I downloaded the image to disk and url into a PIL object and used the following function that I found:

def equal(im1, im2): return ImageChops.difference(im1, im2).getbbox() is None

but this will not work if you have an image saved on the disk with PIL, because it is compressed, even if you change the quality to 100 im1.save(outfile,quality=100) .

At the moment, my code is: http://pastebin.com/295kDMsp but the image will always be saved again.

+3

python diff image-processing python-imaging-library image-comparison

Oskari Kantoniemi Dec 14 '12 at 9:36

source share

3 answers

You can make your own comparison - using the square difference. Then you set the threshold, for example, 95%, and if they are similar, then you do not need to download it. This fixes the compression problem.

+1

Bartlomiej lewandowski Dec 14 '12 at 9:44

source share

As suggested by Bartlomiej Lewandowski, I would recommend comparing the entropy histogram, which is easy and relatively quick to calculate:

 def histogram_entropy(im): """ Calculate the entropy of an images' histogram. Used for "smart cropping" in easy-thumbnails; see also https://raw.github.com/SmileyChris/easy-thumbnails/master/easy_thumbnails/utils.py """ if not isinstance(im, Image.Image): return 0 # Fall back to a constant entropy. histogram = im.histogram() hist_ceil = float(sum(histogram)) histonorm = [histocol / hist_ceil for histocol in histogram]

... This function is the one that I use in the automatic rounding of the filter that I built, but you can use the entropy value to compare any two images (even disparate sizes).

I have other examples of applying this type of idea, let me know with a comment if you want me to send a specific example to your path.

0

fish2000 Dec 14 '12 at 9:55

source share

mmgp · Accepted Answer · 2012-12-14T19:18:19+0000

The title of the question assumes that you have two exact images for comparison, and this is trivial. Now, if you have similar images for comparison, this explains why you did not find a completely satisfactory answer: the metric is not applicable to every problem that gives the expected results (note that the expected results vary between applications). One problem is that it is difficult - in the sense that there is no general agreement - to compare images with multiple bands, such as color images. To deal with this, I will consider the application of this metric in each strip, and the result of this metric will be the lowest total value. This assumes that the metric has a well-established range, such as [0, 1], and the maximum value in this range means that the images are identical (for this metric). Conversely, a minimum value means that the images are completely different.

So all I will do here is give you two metrics. One of them is SSIM , and the other as NRMSE (normalization of the root mean square error). I prefer to introduce the second, because it is a very simple method, and this may be enough for your problem.

Let's start with examples. The images are in the following order: f = original image in PNG, g1 = JPEG with quality 50% f (done using convert f -quality 50 g ), g2 = JPEG 1% quality f , h = "lightened" g2.

Results (rounded):

NRMSE (f, g1) = 0.96
NRMSE (f, g2) = 0.88
NRMSE (f, h) = 0.63
SSIM (f, g1) = 0.98
SSIM (f, g2) = 0.81
SSIM (f, h) = 0.55

To some extent, both metrics handled modifications well, but SSIM showed that they were more reasonable, reporting lower similarities when the images were actually visually distinct, and reporting a higher value when the images were visually very similar. The following example considers a color image (f = source image and g = JPEG with 5% quality).

NRMSE (f, g) = 0.92
SSIM (f, g) = 0.61

So, it is up to you to decide what exactly you prefer and the threshold value for it.

Now the metrics. What I designated as NRMSE is simply 1 - [RMSE / ( maxval - minval )]. Where maxval - maximum intensity from two compared images and, accordingly, the same for minval . RMSE is given by the square root of MSE: sqrt [(sum (A - B) ** 2) / | A |], where | A | means the number of elements in A. Moreover, the maximum value specified by the RMSE is maxval . If you want to further understand the meaning of MSE in images, see, for example, https://ece.uwaterloo.ca/~z70wang/publications/SPM09.pdf . The SSIM (Structural SIMilarity) metric is more active, and you can find the details in the previously included link. To easily apply metrics, consider the following code:

 import numpy from scipy.signal import fftconvolve def ssim(im1, im2, window, k=(0.01, 0.03), l=255): """See https://ece.uwaterloo.ca/~z70wang/research/ssim/""" # Check if the window is smaller than the images. for a, b in zip(window.shape, im1.shape): if a > b: return None, None # Values in k must be positive according to the base implementation. for ki in k: if ki < 0: return None, None c1 = (k[0] * l) ** 2 c2 = (k[1] * l) ** 2 window = window/numpy.sum(window) mu1 = fftconvolve(im1, window, mode='valid') mu2 = fftconvolve(im2, window, mode='valid') mu1_sq = mu1 * mu1 mu2_sq = mu2 * mu2 mu1_mu2 = mu1 * mu2 sigma1_sq = fftconvolve(im1 * im1, window, mode='valid') - mu1_sq sigma2_sq = fftconvolve(im2 * im2, window, mode='valid') - mu2_sq sigma12 = fftconvolve(im1 * im2, window, mode='valid') - mu1_mu2 if c1 > 0 and c2 > 0: num = (2 * mu1_mu2 + c1) * (2 * sigma12 + c2) den = (mu1_sq + mu2_sq + c1) * (sigma1_sq + sigma2_sq + c2) ssim_map = num / den else: num1 = 2 * mu1_mu2 + c1 num2 = 2 * sigma12 + c2 den1 = mu1_sq + mu2_sq + c1 den2 = sigma1_sq + sigma2_sq + c2 ssim_map = numpy.ones(numpy.shape(mu1)) index = (den1 * den2) > 0 ssim_map[index] = (num1[index] * num2[index]) / (den1[index] * den2[index]) index = (den1 != 0) & (den2 == 0) ssim_map[index] = num1[index] / den1[index] mssim = ssim_map.mean() return mssim, ssim_map def nrmse(im1, im2): a, b = im1.shape rmse = numpy.sqrt(numpy.sum((im2 - im1) ** 2) / float(a * b)) max_val = max(numpy.max(im1), numpy.max(im2)) min_val = min(numpy.min(im1), numpy.min(im2)) return 1 - (rmse / (max_val - min_val)) if __name__ == "__main__": import sys from scipy.signal import gaussian from PIL import Image img1 = Image.open(sys.argv[1]) img2 = Image.open(sys.argv[2]) if img1.size != img2.size: print "Error: images size differ" raise SystemExit # Create a 2d gaussian for the window parameter win = numpy.array([gaussian(11, 1.5)]) win2d = win * (win.T) num_metrics = 2 sim_index = [2 for _ in xrange(num_metrics)] for band1, band2 in zip(img1.split(), img2.split()): b1 = numpy.asarray(band1, dtype=numpy.double) b2 = numpy.asarray(band2, dtype=numpy.double) # SSIM res, smap = ssim(b1, b2, win2d) m = [res, nrmse(b1, b2)] for i in xrange(num_metrics): sim_index[i] = min(m[i], sim_index[i]) print "Result:", sim_index

Note that SSIM refuses to compare images when a given window larger than them. window is usually very small, defaults to 11x11, so if your images are smaller than this, there is no "structure" (on behalf of the metric) to compare, and you should use something else (like the other nrmse function). There is probably a better way to implement SSIM , since in Matlab this launch is much faster.

Comparing image in url with image in file system in python

More articles: