Single text image as component symbol

I would like to separate the image of the text from it in the component symbols, as well as in the form of images. For example, using the sample below, I get 14 images.

I will use the text in only one line, so y-height doesn’t matter - I need to find the beginning and end of each letter and crop these coordinates. That way, I would also avoid problems with "i", "j", etc.

I am new to image processing and am not sure how to do this. Some form of edge detection? Is there a way to define contiguous solid color areas? Any help is great.

Trying to improve my Python skills and getting to know some of the many libraries available, so I use the Python Imaging Library (PIL) , but I also looked at OpenCV.


Image example:

This is some text

+4
source share
6 answers

This is not an easy task, especially if the background is not homogeneous. If you already have a binary image like the example, it is a bit simpler.

You can start applying the threshold algorithm if your image is not binary (Otsu adaptive threshold works well)

After you can use the marking algorithm to identify each "island" of pixels that forms your shapes (each character in this case).

The problem arises when you have noise. Forms that have been tagged but are not interested in you. In this case, you can use some heuristic to determine when the figure is a symbol or not (you can use the normalized area, the position of the object, if your text is in a certain place, etc.). If this is not enough, you will have to deal with more complex characters, such as algorithms for extracting curly shapes and some kind of pattern recognition algorithm, like multi-layer perceptrons.

To finish, this seems like an easy task, but depending on the quality of your image, this can get complicated. The algorithms given here can be easily found on the Internet or also implemented in some libraries, such as OpenCv.

The more help, just ask if I can of course help;)

+5
source

I know that I'm several years late :-), but you can do it with ImageMagick pretty easily right now on the command line without compiling anything, since it has built-in component analysis:

Here is one way to do it this way:

#!/bin/bash image="$1" draw=$(convert $image \ -threshold 50% \ -define connected-components:verbose=true \ -define connected-components:area-threshold=10 \ -connected-components 8 \ -auto-level objects.png | \ awk 'BEGIN{command=""} /\+0\+0/||/id:/{next} { geom=$2 gsub(/x/," ",geom) gsub(/+/," ",geom) split(geom,a," ") d=sprintf("-draw \x27rectangle %d,%d %d,%d\x27 ",a[3],a[4],a[3]+a[1],a[4]+a[2]) command = command d #printf "%d,%d %d,%d\n",a[3],a[4],a[3]+a[1],a[4]+a[2] } END{print command}') eval convert "$image" -fill none -strokewidth 2 -stroke red $draw result.png 

The result is as follows:

enter image description here

Firstly, I sometimes get your image at 50%, so that it has only pure black and white, without tonal gradations. Then I tell ImageMagick to display the data about the bounding boxes that it finds, and that I'm not interested in objects smaller than 10 pixels in size from the total area. Then I assume that the pixels are 8-connected, that is, to their diagonal neighbors (NE, SE, NW, SW), as well as their left and right and higher neighboring neighbors. Finally, I parse the output of the bounding box with awk to draw red lines around the bounding rectangles.

The output of the original command that I am analyzing with awk is as follows:

 Objects (id: bounding-box centroid area mean-color): 0: 539x53+0+0 263.7,24.3 20030 srgba(255,255,255,1) 11: 51x38+308+14 333.1,30.2 869 srgba(0,0,0,1) 13: 35x39+445+14 461.7,32.8 670 srgba(0,0,0,1) 12: 35x39+365+14 381.7,32.8 670 srgba(0,0,0,1) 2: 30x52+48+0 60.4,27.0 634 srgba(0,0,0,1) 1: 41x52+1+0 20.9,16.6 600 srgba(0,0,0,1) 8: 30x39+174+14 188.3,33.1 595 srgba(0,0,0,1) 7: 30x39+102+14 116.3,33.1 595 srgba(0,0,0,1) 9: 30x39+230+14 244.3,33.1 595 srgba(0,0,0,1) 10: 35x39+265+14 282.2,33.0 594 srgba(0,0,0,1) 16: 33x37+484+15 500.2,33.0 520 srgba(0,0,0,1) 17: 22x28+272+19 282.3,32.8 503 srgba(255,255,255,1) 5: 18x51+424+2 432.5,27.9 389 srgba(0,0,0,1) 6: 18x51+520+2 528.5,27.9 389 srgba(0,0,0,1) 15: 6x37+160+15 162.5,33.0 222 srgba(0,0,0,1) 14: 6x37+88+15 90.5,33.0 222 srgba(0,0,0,1) 18: 22x11+372+19 382.6,24.9 187 srgba(255,255,255,1) 19: 22x11+452+19 462.6,24.9 187 srgba(255,255,255,1) 3: 6x8+88+0 90.5,3.5 48 srgba(0,0,0,1) 4: 6x8+160+0 162.5,3.5 48 srgba(0,0,0,1) 

and awk turns this into <

 convert http://imgur.com/AVW7A.png -fill none -strokewidth 2 -stroke red \ -draw 'rectangle 308,14 359,52' \ -draw 'rectangle 445,14 480,53' \ -draw 'rectangle 365,14 400,53' \ -draw 'rectangle 48,0 78,52' \ -draw 'rectangle 1,0 42,52' \ -draw 'rectangle 174,14 204,53' \ -draw 'rectangle 102,14 132,53' \ -draw 'rectangle 230,14 260,53' \ -draw 'rectangle 265,14 300,53' \ -draw 'rectangle 484,15 517,52' \ -draw 'rectangle 272,19 294,47' \ -draw 'rectangle 424,2 442,53' \ -draw 'rectangle 520,2 538,53' \ -draw 'rectangle 160,15 166,52' \ -draw 'rectangle 88,15 94,52' \ -draw 'rectangle 372,19 394,30' \ -draw 'rectangle 452,19 474,30' \ -draw 'rectangle 88,0 94,8' \ -draw 'rectangle 160,0 166,8' result.png 
+5
source

You can start with a simple related component analysis (CCA) algorithm that can be implemented quite efficiently using a scanning algorithm (you simply track the combined regions and re-mark at the end). This will give you separately numbered "drops" for each continuous region that will work for most (but not all) letters. Then you can just take the bounding box of each connected blob, and this will give you a diagram for each. You can even maintain a bounding box when applying CCA for efficiency.

So, in your example, the first word on the left after the CCA will lead to something like:

 1111111 2 3 1 2 1 2 4444 5 666 1 22 4 5 6 1 2 4 5 666 1 2 4 5 6 1 2 4 5 666 

with equivalence classes 4 = 2.

Then, the bounding boxes of each blob give you the area around the letter. You will run into problems with letters like me and j, but they can be thoughtful. You can search for an area smaller than a certain size that is above another area of ​​a certain width (like a rough heuristic).

The cvBlobsLib library in OpenCV should do most of this for you.

+2
source

Um, this is really very simple for the sample you provided:

 start at left edge go right 1 column at a time until the current column contains black (a letter) this is the start of the character go right again till no black at all in current column end of character repeat till end of image 

(By the way, this also works to split a paragraph into lines.)
If the letters overlap or are separated by columns, it gets a little more difficult .

Edit:

@Andres, no, it works fine for "U", you need to look at all of each column

  UU UU UU UU UUU 01234 0,4:everything but bottom row 1-3:only bottom row 
+2
source

I recently played with ocropus , an open source text analysis and ocr preprocessing tool. As part of the workflow, it also creates the images you need. Perhaps this will help you, although python magic is not involved.

+1
source

The problem you posed is really complex and required some of the best image processing researchers in the world to solve for quite some time. The solution is an important part of Djvu's means for compressing and displaying images: their first step in compressing a document is to prioritize and divide it into characters. Then they use this information for compression, because the image of one lower case "e" is very similar to another: the compressed document should contain only differences. You will find links to a bunch of technical documents at http://djvu.org/resources/ ; A good place to start is High - quality image compression with Djvu .

Many of the tools in the Djvu package were opened under the djvulibre heading; Unfortunately, I was not able to figure out how to bring the foreground (or individual characters) using existing command line tools. I would be very interested in doing this.

+1
source

All Articles