How to identify non-photographic or “uninteresting” images using the Python Image Library (PIL)

I have thousands of images, and I need to weed out those that are not photographs, or otherwise "interesting."

An “uninteresting” image, for example, can be just one color, or basically one color, or a simple icon / logo.

The solution does not have to be perfect, good enough to remove the least interesting images.

My best idea is to take an arbitrary sample of pixels, and then ... do something with them.

+5
python imaging python-imaging-library
source share
1 answer

Dunfe beat me up. Here is my method for calculating image entropy:

import Image from math import log def get_histogram_dispersion(histogram): log2 = lambda x:log(x)/log(2) total = len(histogram) counts = {} for item in histogram: counts.setdefault(item,0) counts[item]+=1 ent = 0 for i in counts: p = float(counts[i])/total ent-=p*log2(p) return -ent*log2(1/ent) im = Image.open('test.png') h = im.histogram() print get_histogram_dispersion(h) 
+2
source share

All Articles