The image cropping algorithm dilemma - is this possible?

I am building a web application using .NET 3.5 (ASP.NET, SQL Server, C #, WCF, WF, etc.) and I am faced with the main dilemma of the project. This is a uni project, but it is 100% dependent on me what I am developing.

I need to create a system in which I can take an image and automatically crop a specific object inside it, without user input. So, for example, cut a car in the picture of the road. I thought a lot about this, and I don't see any acceptable method. I think this topic should discuss the challenges and feasibility of achieving this goal. In the end, I would get the dimensions of the car (or whatever), and then pass this to the 3D modeling application (custom) as parameters to display the 3d model. This last step is much more possible. This is a cropping problem, which is a problem. I thought of various ideas, such as getting a car color, and then outlines around that color. Therefore, if the car (example) is yellow when there is a yellow pixel in the image, trace around it. But this will not work if there are two yellow cars in the photo.

Ideally, I would like the system to be fully automated. But I think that I can’t do it my own way. Also, my skills are in what I mentioned above (.NET 3.5, SQL Server, AJAX, web design), unlike C ++, but I would be open to any solution to see the feasibility.

I also found this patent: US Patent 7034848 - System and Method for Automatic Cropping of Graphic Images

thanks

+6
algorithm image-processing computer-vision
source share
8 answers

This is one of the challenges you need to solve to complete the DARPA Grand Challenge . The video has an excellent presentation of the project, leading from the winning team , where he talks about how they went to solve them, and how some of the other teams approached him. The corresponding part begins around 19:30 of the video, but this is an excellent conversation, and all this is worth a look. Hope this gives you a good starting point to solve your problem.

+2
source share

What you are talking about is an open research problem or even a few research problems. One way to solve this problem is image segmentation. If you can safely assume that there is one object of interest for the image, you can try the segmentation-shape segmentation algorithm. There are many such algorithms, and none of them are perfect. Usually they display a segmentation mask: a binary image, where the figure is white and the background is black. Then you will find the bounding box of the shape and use it to crop. It should be remembered that none of the existing segmentation algorithms will give you what you want in 100% of cases.

Alternatively, if you know in advance which specific type of object you need to crop (car, person, motorcycle), then you can try the object detection algorithm. Again, there are many, and none of them are perfect. On the other hand, some of them may work better than segmentation if your object of interest is in a very cluttered background.

To summarize, if you want to do this, you will need to read a large number of documents for computer vision and try many different algorithms. You will also increase your chances of success if you can limit your problem domain as much as possible: for example, limit yourself to a small number of categories of objects, assume that there is only one object of interest for the image, or limit yourself to a certain type of scene (nature, sea, etc.) .d.). Also keep in mind that even the accuracy of modern approaches to solving such problems has many opportunities for improvement.

And by the way, choosing the language or platform for this project is the least difficult part.

+2
source share

A method often used to recognize faces in images is to use the Haar classifier cascade. The cascade of the classifier can be trained to detect any objects, not just faces, but the ability of the classifier is highly dependent on the quality of the training data.

This Viola and Jones article explains how it works and how it can be optimized.

Although this is C ++, you might want to take a look at the image processing libraries provided by the OpenCV project, which includes code for both training and using Haar cascades. You will need a set of automotive and non-automotive images for training the system!

+2
source share

Some of the best attempts that I see in this is to use a large image database to help you understand the image you have. Nowadays, you have flickr, which is not only a giant body of images, but also tagged with meta-information about that image.

Some of the projects that do this are described here:

http://blogs.zdnet.com/emergingtech/?p=629

0
source share

Start by analyzing the images yourself. Thus, you can formulate criteria that match the car. And you can determine what you cannot match.

If all cars have the same background, for example, this does not have to be complicated. But in your example, the car on the street is indicated. There may be parked cars. Should they be recognized?

If you have access to MatLab, you can test pattern recognition filters using specialized software such as PRTools .

When I was studying (a long time ago :), I used Khoros Cantata and found that the edge filter can greatly simplify the image.

But again, we first determine the conditions at the input. If you do not, you will not succeed, because pattern recognition is really difficult (think about how long it took to crack the captcha)

0
source share

I said a photo, so it could be a black car with a black background. I thought about indicating the color of the object, and then when that color is found, trace around it (high level explanation). But, with a black object on a black background (without any contrasts), this would be a very difficult task.

Even better, I came across several sites with 3d car models. I could always use this, embed it in a 3d model and render.

It would be easier to work with a 3D model, and real world photography is much more complicated. He suck: (

0
source share

If I read it right ... Here the AI ​​shines.

I think that the “simplest” solution would be to use a neural network pattern recognition algorithm. If you do not know that the car will look exactly the same in every shot, then this is almost the only way.

If this is the same, then you can just find the pixel pattern and get a bounding box and just set the image border to the inside border of the rectangle.

-one
source share

I think that you will never get good results without a real user telling the program what to do. Think of it this way: how will your program decide when there is more than one interesting object (for example: 2 cars)? what if the object you want is actually a mountain in the background? what if there is nothing interesting in the picture, so nothing to choose as an object for cropping? etc. etc.

With that said, if you can make assumptions such as: there will be only one object, then you can go using image recognition algorithms .
Now that I think about it. I recently gave a lecture on artificial intelligence in robots and robotic research methods. Their study continued linguistic interaction, evolution, and language recognition . But for this, they also needed some simple pattern recognition algorithms to process the perceived environment. One of the tricks they used was to create a 3D graphic of the image, where x and y, where the normal x and y axis and the z axis were the brightness of this particular point, then they used the same technique for red-green values, and blue yellow. And so they had something (relatively) that they could use to distinguish objects from the perceived environment.
(I'm sorry, but I can’t find a link to the good diagrams that they showed how it all worked).

In any case, the fact is that they were not interested (so much) in pattern recognition, so they created something that worked well enough and used something less advanced and, therefore, less time-consuming, so you can create something simple for this difficult task.

Also, any good image editing program has some kind of magic wand that will choose with the right amount of settings, the object of interest that you are pointing to may be worth your time and that.

So this will basically mean that you:

  • must make some assumptions, otherwise it will fail.
  • probably best served by methods from AI and, more specifically, pattern recognition
  • you can see paint.NET and their algorithm for their magic wand
  • try to use the fact that a good photograph will have an object of interest somewhere in the middle of the image

.. but I'm not saying that this is the solution to your problem, maybe something simpler can be used.

Oh, and I will continue to search for these links, they contain some really valuable information about this topic, but I can’t promise anything.

-one
source share

All Articles