Rotate hundreds of JPEGs in seconds, not hours

We have hundreds of images that our computer receives at a time, and we need to rotate them and change them as quickly as possible. The rotation is 90, 180 or 270 degrees.

We are currently using the GraphicsMagick command-line tool to rotate the image. Image rotation (5760 * 3840 ~ 22MP) takes 4 to 7 seconds.

The following python code sadly gives us equal results

import cv img = cv.LoadImage("image.jpg") timg = cv.CreateImage((img.height,img.width), img.depth, img.channels) # transposed image # rotate counter-clockwise cv.Transpose(img,timg) cv.Flip(timg,timg,flipMode=0) cv.SaveImage("rotated_counter_clockwise.jpg", timg) 

Is there a faster way to rotate images using the power of a video card? OpenCL and OpenGL come to mind, but we are wondering if performance gains will be noticeable.

The hardware we use is quite limited, as the device should be as small as possible.

The software is debian 6 with official (closed source) Radeon drivers.

+3
source share
3 answers

you can do a lossless rotation that just changes the EXIF ​​section. This will speed up your shots.

and look at the jpegtran utility, which performs lossless jpeg changes. http://linuxmanpages.com/man1/jpegtran.1.php

+11
source

There is a jpeg no-recompression plugin for irfanview , which IIRC can rotate and resize images (in a simple way) without recompressing, it can also start the image catalog - it should be much faster

The GPU would probably not help, you are almost certainly limited by I / O in opencv, it is not very optimized for high-speed file access

+4
source

I am not an expert on jpeg and compression topics, but since your problem is pretty much limited by I / O, since it becomes (if you can turn around without the heavy computations involved in encoding), you may not be able to speed it up on the GPU that you have there is. (Un) Fortunately, your link is a fairly slow Atom processor.

I assume that Radeon has a separate main memory. This means that data must be transmitted via PCI-E, which is an additional delay compared to CPU execution and without hiding you can be sure that this is a bottleneck. This is the most likely reason that your code that uses OpenCV on the GPU is slow (besides the fact that you are doing two memory-binding operations, transpose and flip instead of one).

The main thing is to hide as much PCI-E transfer time as possible with multiple buffering . Overlapping transmissions both on the GPU and from it using computations using the full duplex feature of PCI-E will work only if this card has engines with two DMAs, for example high-end Radeons or NVIDIA Quadro / Tesla cards - which I really doubt.

If your GPU computing time (the time it takes for the GPU to rotate) was lower than the transfer time, you won’t be able to completely overlap. The HD 4530 has a rather slow memory interface with a maximum peak of 12.8 Gbit / s , and the rotation core should be fully memory-bound. However, I can only evaluate, but I would say that if you reach the maximum transfer rate of PCI-E ~ 1.5 Gb / s (4x PCI-E AFAIK), the computing core will be several times faster than transmission, you can overlap a little. You can simply separate the parts separately without requiring complex asynchronous code, and you can evaluate how quickly you can achieve the optimal match.

One thing you might want to consider is getting hardware that does not demonstrate PCI-E as a bottleneck, for example:

  • AMD APU system . On these platforms, you can lock the page lock and use it directly from the GPU;
  • integrated GPUs that share the main memory with the host;
  • a fast low-power processor such as an Ivy Bridge mobile modem, for example. i5-3427U , which consumes almost as much as the Atom D525 but has AVX support and should be several times faster.
+1
source

All Articles