How to create a single thumbnail for a billion png images?

The application has about 1 billion png-images (size 1024 * 1024 and about 1 MB each), it must combine 1 billion images with a huge image, and then creates the size of 1024 * 1024 unitary sketch . Or maybe we don’t really need to combine images with huge ones, but just make some kind of magic algorithm to create a unified sketch in the computer’s memory? Meanwhile, this process needs to be done as quickly as possible, better in seconds, or at least in a few minutes. Does anyone have an idea?

enter image description here

+8
image image-processing imagemagick png graphic
source share
3 answers

The idea of ​​loading a billion images into one montage process is ridiculous. Your question is not clear, but your approach should be to determine the number of pixels of each source image in the final image, and then extract the required number of pixels from each image in parallel. Then connect these pixels to the final image.

So, if each image will be represented by one pixel in your final image, you need to get the average value for each image, which you can do as follows:

 convert image1.png image2.png ... -format "%[fx:mean.r],%[fx:mean.g],%[fx:mean.b]:%f\n" info: 

Output result

 0.423529,0.996078,0:image1.png 0.0262457,0,0:image2.png 

You can do this very quickly in parallel with GNU Parallel , using something like

 find . -name \*.png -print0 | parallel -0 convert {} -format "%[fx:mean.r],%[fx:mean.g],%[fx:mean.b]:%f\n" info: 

Then you can make the final image and place the individual pixels.

Scanning even 1,000,000 PNG files is likely to take many hours ...

You don’t say how large your images are, but if they are of the order of 1 MB each and you have 1,000,000,000, then you need to do a petabyte of I / O to read them, so even with 500 MB / s an ultrafast SSD, You will be there 23 days.

+9
source share

ImageMagick can do this:

montage -tile *.png tiled.png

If you do not want to use an external helper for any reason, you can still use sources.

+3
source share

A randomized algorithm, such as random sampling, may be feasible.

Given that the combined image is so large, any linear algorithm may fail, not to mention a more complex method.

According to the calculations, we can conclude that each pixel of the sketch depends on 1000 images. Thus, the residual sample result does not affect the result.

The description of the algorithm may look as follows:

For each pixel coordinate of the thumbnails, randomly select N images that are in the corresponding place, and each sample of images M pixels, and then calculates their average value. Do the same for the other pixel thumbnails.

However, if your images are randomly combined, the result is usually 0.5 meaningful grayscale images. Because, according to the Central Limit Theorem, the dispersion of a pixel of a reduced image has a zero value. This way you will see that the combined sketch is structured by itself.

PS: using OpenCV would be a good choice

+3
source share

All Articles