Removing Unwanted Symbols on an Organic Molecule Chart

I have this image:

enter image description here

I want to remove all parts of the image that are not part of the structure of organic molecules. So in this particular image, I want to remove Process A and line below it . I tried using bwlabel to connect components, but the structure itself does not form a single component. Thus, removal by this method is not possible. Any idea how I can solve this problem?

+5
source share
2 answers

There are two ways to approach this, depending on your preference.

Method # 1 - Using bwareaopen

A cheap way to do this would be to transform the image so that the pixels of the object are white, not black, and then morphologically close the image and remove those areas that fall under a certain amount. Closing brings together the united regions and takes advantage of the fact that joining the "structure" will lead to the creation of a region with a large area, you can span the region of each region and exclude those regions that fall below a certain amount.

Then you can return to the original image by simply doing a logical AND with an inverted image and a closed result, and then reinstalling this intermediate result. The effect of this is that we only save pixels belonging to the original image due to the close operation, artificially creating the pixels of the object. In particular, combining neighboring areas of the structure will create new pixels for the object, and therefore, AND will ensure that these pixels do not match the original. Since this is done on the back of the original result, re-conversion returns you to the original pixel domain of the object, which is black, not white.

Something like that:

 %// Read in image from StackOverflow im = imread('http://i.stack.imgur.com/A7iT7.png'); %// Invert image im = ~im; %// Define 50 x 50 structuring element and close the image se = strel('square', 50); out = imclose(im, se); %// Remove regions whose areas fall below 10000 pixels out = bwareaopen(out, 10000); %// Remove out extraneous closing areas by ANDing with inverted image %// then reinvert to bring back to original label scheme out = ~(im & out); %// Show the image imshow(out); 

We get this image:

enter image description here

Notes

  • The imclose function imclose do a morphological closure for you using the structuring element defined by strel . I used a 50 x 50 square to make sure that we have a large enough window to combine the neighboring pixels of the object.
  • The bwareaopen function takes a binary image and removes areas whose pixel areas are below a certain value. After closing, you will have two connected areas - the upper part of the image with the structure and the lower part with the text. In experiments, 10,000 pixels removed the area below.

Method # 2 - Using regionprops

In connection with method No. 1, an alternative method for this and to be an agent, which is a threshold, is to transition with your original idea. Perform a close operation, but then evaluate the areas of each of the connected areas and select the one that has the largest area. In this case, I recommend using regionprops , which is a function specifically designed to analyze the characteristics of individual areas of the image. The result will be a structure of N elements, where N is the total number of unique and related objects found in the image, and each structure contains property fields that you want to measure on the image. In your case, specify the 'Area' and 'PixelIdxList' , which contain areas and main pixel pixel locations in each region.

You will find the maximum area as a whole and use the corresponding pixel locations and install an output map with which you would logically AND .

Something like that:

 %// Read in image from StackOverflow im = imread('http://i.stack.imgur.com/A7iT7.png'); %// Invert image im = ~im; %// Define 50 x 50 structuring element and close the image se = strel('square', 50); out = imclose(im, se); s = regionprops(out, 'Area', 'PixelIdxList'); %// Apply regionprops %// Find the region with the max area [~,id] = max([s.Area]); %// Create an output mask with the largest area %// Make logical out = false(size(im)); %// Set pixels from largest area out(s(id).PixelIdxList) = true; %// Rest of the logic from before %// Remove out extraneous closing areas by ANDing with inverted image %// then reinvert to bring back to original label scheme out = ~(im & out); %// Show the image imshow(out); 

You should get exactly the same results as the first method.

+8
source

Following the assumption that the title of the image is separated in space from the "actual image" far enough:

Build drops, blurring the image, find connected components, take the top / top one (or another heuristic that depends on your data). Therefore, before using the algorithm of the connected component, do the preliminary processing:

  • Gauss / Median filter (if necessary) and edge detection.
  • binarization
  • Morphological operations (erosion, expansion)
  • Highlight Blob with heuristic (size / shape / position).

while 4. is a replacement for connected components (which is optional). You can search for other methods under the keyword blob extraction or text extraction . This is a rough outline of what you will do "generally." Which steps bring the best solution, it depends on your data, so you have to experiment a bit.

+1
source

All Articles