Motion detection in screenshots

Question

Motion detection in screenshots

I would like to know if there is a quick algorithm that can detect parts that move between two consecutive screenshots. The algorithm should take two images and output a set of (rectangular) regions in one image and a vector that describes where the correspondence region is in another image.

I would like to use this for a lossless compression algorithm that is optimized for screen capture. I think this makes the use case a little different from regular motion detection applications:

Images are screenshots. It is unlikely that there are any artifacts or image noise.
If part of the image moves, it moves in pixels. Moved parts usually differ by less than 2% from their pixels.
The regions that move are often large and rectangular in shape.

Since the video compression pipeline also has other steps and must be performed in real time, motion detection should be fast.

Is there anything useful?

+4

screenshot motion-detection video-processing video-codecs

fuz Aug 15 '12 at 20:26

source share

4 answers

I have a few thoughts and a possible solution that you can consider.

First, consider tracking individual delta pixels and transmitting / saving only those. A typical interactive session typically includes very small parts of a user interface change; moving or resizing windows is usually less common (anecdotally) for long sessions of using a computer. It effectively captures simple things like typed text, cursor movements, and small user interface updates without much extra work.

You can also try connecting the OS at a lower level to get, for example, displaying a list of pixels or even (optimally) a list of rectangles of “damage”. For example, Mac OS X quartz picker may provide you with this information. This can help you quickly narrow down what needs to be updated, and ideally can give you an effective view of the screen on its own.

If you can request information about Windows (the window manager) about windows, you can store separate data streams (pixel deltas) for each visible window, and then take a simple approach to displaying the list to "render" them during playback. It is then trivial to identify moving windows, as you can simply distinguish between the displayed lists.

If you can request OS information about the cursor position, you can use the cursor movement to quickly evaluate the delta movement, since the cursor moves, as a rule, correlates well with the movement of the object on the screen (for example, moving windows, icons, dragging objects, etc.). d.).). This avoids image processing to determine delta motion.

In the case of a possible solution (or in the extreme case, if you still cannot determine the delta of motion with the above): we can deal with the (very common) case of one moving rectangle quite easily. Make a mask of all the pixels that change in the frame. Identify the largest connected component in the mask. If it approaches a rectangle, you can assume that it represents a displaced area. Either the window moves exactly orthogonally (for example, entirely in the x or y direction), in which case the common delta looks like a slightly larger rectangle, or the window moves diagonally, in which case the common delta will have an 8-sided shape. In any case, you can evaluate the motion vector and check this by changing the areas. Please note that this procedure intentionally ignores details that you will have to consider, for example. pixels that move independently of windows, or areas that do not change (for example, large blocks of solid color in a window). A practical implementation would have to cope with all of the above.

Finally, I will review the existing literature on real-time motion estimation. A lot of work has been done to optimize motion estimation and compensation, for example. video encoding, so you can also use this work if you find that the methods above are inadequate.

+1

nneonneo Aug 23 '12 at 4:16

source share

Opencv covers indepth image processing and has a large number of tutorials on this subject.

http://docs.opencv.org/doc/tutorials/tutorials.html

The PDF has more links than the site. Google for ... opencv tutorials pdf ... top link.

The main site. http://opencv.willowgarage.com/wiki/

Essentially, there are math functions that you can run on images that work for you. Convolution, etc.

+1

Emile Aug 24 '12 at 9:17

source share

A common way to track movement by frames is: 1. Identify the points that you want to track in image 1 2. Correlation of points in the input image with points in the output image 3. Identify the transformation that received them there.

For step 1, there are many “trackers,” some pretty standard ones that you will find in OpenCV that look for “interesting” points (intersections of edges, local maxima, etc.). One of them is the Kanade-Tomasi image tracker. However, for your use, you can simply create a regular grid of points.

For step 2, the general technique is to use a reduced resolution quadrant ... that I want to take your image, create a new image with a width and height of 1/2, and again and again. You now have a very low resolution image that you can search faster and will give you a bounding box to look at in the next higher rez image. In your case, the optimization may consist in looking at the result first to see if it has changed at all, and when you find a match for the point x, y, and also look next to it for the point x + 1 or y + 1 and t .d.

For step 3, what is up to you ... if you say that the windows glide across the screen, there will be large patches that move together but are otherwise identical outside and inside the edges. If there is any kind of animation, this may throw things away. And the mouse cursor itself is a small thing that is going to move and reduce the efficiency of the algorithm.

+1

samkass Aug 24 '12 at 17:27

source share

fuz · Accepted Answer · 2012-08-24T15:55:29+0000

What I've done

I implemented a simple technique that compensates for most movements and is easy to implement.

Each frame is divided into tiles of constant size, for example, 8 × 8 pixels. The codec manages a ring buffer of a configurable number of fragments, for example, ^2–20 . Now, for each fragment, the codec is found in the input stream, it checks whether it will already be found in the ring buffer. If so, it simply saves the tile index, if it does not store the tiles in a ring buffer.

Whenever a part of an image is moved in multiple block sizes from any image in the past, it is very likely that you will find tiles in the cache. Thus, it is possible to compensate for most movements. Since tile search in ringbuffer is very fast, it is fast enough to work in real time.

Motion detection in screenshots

What I've done

More articles: