Algorithm for counting the number of unique image colors

Question

Algorithm for counting the number of unique image colors

We are looking for one that is fast enough and still graceful with memory. Image is a 24bpp System.Drawing.Bitmap.

+7

c # algorithm image-processing

user21623 Sep 24 '08 at 11:51

source share

10 answers

Lou franco · Answer 1 · 2008-09-24T11:57:11+0000

If you need an exact number, you have to iterate over all the pixels. Preserving color and graph in a hash is probably the best way to go because of sparseness of colors.

Using Color.ToArgb () in a hash instead of a color object is likely to be a good idea too.

Also, if speed is a serious issue, you don’t want to use a function like GetPixel (x, y), instead try processing fragments at a time (time bar). If you can, get a pointer to the beginning of the image memory and make it unsafe.

Jeebee · Answer 2 · 2008-09-24T12:00:23+0000

Never implemented something like this before, but as I see it, a primitive implementation:

For a 24-bit image, the maximum number of colors that an image can have is the minimum (2 ^ 24, the number of image pixels).

You only need to indicate whether a particular color has been counted, and not how many times it has been counted. This means that you need 1 bit to record whether each color is counted. This is 2 MB of memory. Go through the pixels, set the appropriate bit in your 2MB color map. At the end of the iteration over the color picker card that counts the set bits (if you're lucky, you will have a POPCNT instruction to help with this).

For smaller images and, of course, lower color depths, you might be better off saving a color table and counting for each color that is in the image.

Mecki · Answer 3 · 2008-09-24T16:03:23+0000

Most people have suggested solutions that are likely to be fast (in fact, one that uses only 2 MB is probably acceptable in terms of memory usage and very fast, one that has a hash may be even faster, but it will definitely be use more than 2 MB of memory). Programming is always a trade-off between memory usage and processor time. Usually you can get results faster if you are willing to “spend” more memory, or you can get results slower by “spending” more time computing, however this usually protects you from large memory.

Here is one solution that no one has proposed so far. This is probably the one that costs less memory (you can optimize it, so it is unlikely to use more memory than it needs to save the image in memory, however the image will be resized, although you will have to copy it first). I doubt that it can beat the solution of the hash or bit masks in speed, it is just interesting if memory is your biggest problem.

Sort pixels in an image by color. You can easily convert each pixel to a 32-bit number, and 32-bit numbers can be compared with each other, with one number being less than the other, greater or equal. If you use Quicksort, no additional storage space is required for sorting, except for additional stack space. If you use Shellsort, no additional memory is required at all (although Shellsort will be much slower than Quicksort).
int num = (RED <16) + (GREEN <8) + BLUE;
As soon as you sort pixels like this one (which means you redid them inside the image), all pixels of the same color are always next to each other. This way, you can only iterate over the image and watch how often the color changes. For example. you save the current pixel color to (0, 0) and you start the counter with a value of 1. The next step is you go (0, 1). If this is the same color as before, there is nothing to do, continue with the next pixel (0, 2). However, if this is not the same, increase the counter by one and remember the color of this pixel for the next iteration.
As soon as you look at the last pixel (and possibly increase the counter again if it was not the same as the second last pixel), the counter contains the number of unique colors.

Iterating over all the pixels at least once is what you should do in any case, regardless of the solution, so it does not affect this solution more slowly or faster than other solutions. The speed of this algorithm depends on how quickly you can sort the pixels of the image by color.

As I said, this algorithm is easily beaten when the speed of your main concert (other solutions here are probably faster), but I doubt that it can be beaten when using memory is your main problem, as there is enough space besides the counter storage to store one color and a place to store the image itself, it will only need additional memory if your chosen sorting algorithm needs any.

Konrad Rudolph · Answer 4 · 2008-09-24T12:01:47+0000

var cnt = new HashSet<System.Drawing.Color>(); foreach (Color pixel in image) cnt.Add(pixel); Console.WriteLine("The image has {0} distinct colours.", cnt.Count);

/ EDIT: as Lou said, using .GetArgb() instead of the Color value itself may be slightly faster due to the way Color implements GetHashCode .

Rick minerich · Answer 5 · 2008-09-24T16:06:26+0000

Most other implementations here will be slow. For this to be fast, you need direct access to the scan and some sparse matrix for storing color data.

First I will describe the 32bpp case, it is much simpler:

HashSet: Sparse Color Matrix
ImageData: use the BitmapData object to directly access main memory
PixelAccess: use int * to reference memory as ints, which you can iterate through

For each iteration, just do hashset.add of that integer. In the end, just look at how many keys are in the HashSet, and the total number of colors. It’s important to note that resizing a HashSet is very painful (O (n), where n is the number of elements in the set), and so you might want to build a HashSet with enough size to start with, maybe something like imageHeight * imageWidth / 4 will be fine .

In the case of 24bpp, PixelAccess must be a byte *, and you need to iterate over 3 bytes for each color to build an int. For each byte in a set of the first 3 bits, shift left by 8 (one byte) and add it to the integer. Now you have 24bpp color represented by 32bit int, the rest is all the same.

Tall jeff · Answer 6 · 2008-09-24T12:28:35+0000

You definitely haven’t identified unique colors. If you really mean really unique code values (as opposed to visually the same), then the only accurate solution is to actually calculate them using one of the methods described in the other answers.

If you are looking for visually similar colors, this quickly fixes the problem with displaying a palette in which you are looking for the 256 best unique colors that you can use to fully represent your full dynamic color image. For most images, it is amazing how good the image, reduced from 24 bits to 16 million different colors to begin with, can be matched with an image with 256 unique colors when these 256 colors are well selected. It has been proven that the optimal selection of those 256 colors (for this example) is NP-complete, but there are practical solutions that can come close. Look for the papers of a guy named Shijie Wang and things built on his work.

If you are looking to approximate the number of colors of the code value in the image, I would compress the image using a lossless compression scheme. The compression ratio will be directly related to the number of unique code values in the image. You don’t even need to save compressed output, just accumulate the number of bytes along the path and discard the actual output. Using a set of sample images as a reference, you can create a lookup table between the compression ratio and the number of different code values in the image. Again, this latter method, although fairly quick, will definitely be an approximation, but it should be well correlated.

Cruachan · Answer 7 · 2008-09-24T11:59:42+0000

Before modern graphics cards, when most machines worked in 256 color palette mode, this was an area of considerable interest. The limits of processing power and memory are superimposed only on what may be useful to you - so a search for palette processing algorithms is likely to become something useful.

ComSubVie · Answer 8 · 2008-09-24T12:00:12+0000

It depends on what types of images you want to analyze. For 24-bit images, you will need up to 2 MB of memory (since in the worst case you have to process each color). A bitmap would be a better idea for this (you have a 2 MB bitmap where each bit corresponds to a color). This would be a good solution for images with a lot of colors, which can be implemented in O (#pixels). For 16-bit images, you only need 8 kB for this bitmap using this technique.

However, if you have photos with a small amount of colors, it is better to use something else. But then you will need some kind of check to indicate which algorithm you should use ...

belugabob · Answer 9 · 2008-09-24T12:12:37+0000

The maximum number of unique colors in an image is equal to the number of pixels, so this is predictable from the very beginning of the process. Using Conrad’s proposed HashSet method, it will seem like a reasonable solution, since the size of the hash should be no more than the number of pixels, while using the raster approach proposed by JeeBee would require 512 MB for a 32-bit image (If there is an alpha channel, and this is determined to contribute to the uniqueness of color)

However, the effectiveness of the HashSet approach is likely to be worse than the beat-for-color approach - you can try both and do some tests using a lot of different images

Liudvikas bukys · Answer 10 · 2008-10-31T15:27:15+0000

The modern popular implementation of color quantization uses octree . Pay attention to the wikipedia pages, the content is pretty good. The advantage of octree is that it is limited by memory, since you can try the whole image and choose your palette without additional memory. Once you understand the concept, follow the link 1996 source code for a Dr Dobb magazine article .

Since this is a C # question, see the MSDN article of May 2003. Optimizing color quantization for ASP.NET images , which includes some source code.

Algorithm for counting the number of unique image colors

More articles: