The title of the question assumes that you have two exact images for comparison, and this is trivial. Now, if you have similar images for comparison, this explains why you did not find a completely satisfactory answer: the metric is not applicable to every problem that gives the expected results (note that the expected results vary between applications). One problem is that it is difficult - in the sense that there is no general agreement - to compare images with multiple bands, such as color images. To deal with this, I will consider the application of this metric in each strip, and the result of this metric will be the lowest total value. This assumes that the metric has a well-established range, such as [0, 1], and the maximum value in this range means that the images are identical (for this metric). Conversely, a minimum value means that the images are completely different.
So all I will do here is give you two metrics. One of them is SSIM , and the other as NRMSE (normalization of the root mean square error). I prefer to introduce the second, because it is a very simple method, and this may be enough for your problem.
Let's start with examples. The images are in the following order: f = original image in PNG, g1 = JPEG with quality 50% f (done using convert f -quality 50 g ), g2 = JPEG 1% quality f , h = "lightened" g2.




Results (rounded):
- NRMSE (f, g1) = 0.96
- NRMSE (f, g2) = 0.88
- NRMSE (f, h) = 0.63
- SSIM (f, g1) = 0.98
- SSIM (f, g2) = 0.81
- SSIM (f, h) = 0.55
To some extent, both metrics handled modifications well, but SSIM showed that they were more reasonable, reporting lower similarities when the images were actually visually distinct, and reporting a higher value when the images were visually very similar. The following example considers a color image (f = source image and g = JPEG with 5% quality).


- NRMSE (f, g) = 0.92
- SSIM (f, g) = 0.61
So, it is up to you to decide what exactly you prefer and the threshold value for it.
Now the metrics. What I designated as NRMSE is simply 1 - [RMSE / ( maxval - minval )]. Where maxval - maximum intensity from two compared images and, accordingly, the same for minval . RMSE is given by the square root of MSE: sqrt [(sum (A - B) ** 2) / | A |], where | A | means the number of elements in A. Moreover, the maximum value specified by the RMSE is maxval . If you want to further understand the meaning of MSE in images, see, for example, https://ece.uwaterloo.ca/~z70wang/publications/SPM09.pdf . The SSIM (Structural SIMilarity) metric is more active, and you can find the details in the previously included link. To easily apply metrics, consider the following code:
import numpy from scipy.signal import fftconvolve def ssim(im1, im2, window, k=(0.01, 0.03), l=255): """See https://ece.uwaterloo.ca/~z70wang/research/ssim/"""
Note that SSIM refuses to compare images when a given window larger than them. window is usually very small, defaults to 11x11, so if your images are smaller than this, there is no "structure" (on behalf of the metric) to compare, and you should use something else (like the other nrmse function). There is probably a better way to implement SSIM , since in Matlab this launch is much faster.