Recognize Micr Font with OCR Engine?

Question

Recognize Micr Font with OCR Engine?

I use the Microsoft OCR library to read text.

The Microsoft OCR library works great. However, I want to read the following list of characters listed at http://www.ict4u.net/databases/database-images/micr.jpg . Is there a way I can teach the OCR library to read the following characters, or is there a language that allows you to read the following characters.

+1

windows-phone ocr windows-runtime

Cloy Aug 08 '16 at 8:17

source share

2 answers

Corneliakara · Answer 1 · 2016-08-09T17:00:51+0000

[Microsoft OCR team here] We do not yet support OCR training to customize it for your use cases. However, we are actively monitoring stackoverflow to find out what developers need, so we can continue to improve the OCR mechanism.

Elmue · Answer 2 · 2016-08-09T13:59:49+0000

I have been working with Microsoft OCR for a while. Compared to Tesseract, it has very basic features.

For example, Microsoft OCR returns words and strings. But lines are nonsense. Randomly 2 or 3 words are grouped together as a "string", but they are not a real line. And the "lines" are completely disordered. In this aspect, it is worse than Tesseract. You must take the coordinates of each word and order them yourself.

Microsoft does not return character rectangles, and there is absolutely no way to customize or train Microsoft OCR in any way. You can add languages with Windows Update for "Basic Typing" = OCR (see http://www.thewindowsclub.com/install-uninstall-languages-windows-10 ), but you cannot train your own language data.

MSDN says the following 25 languages are supported with varying degrees of accuracy:

Excellent: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish and Swedish.
Very good: Simplified Chinese, Greek, Japanese, Russian and Turkish.
Good: traditional Chinese and Korean.

Recognition quality is very similar to Tesseract. He even has the same problems as Tesseract. Some individual characters are not recognized (individual characters, such as a single '$'), and it has the same huge problem as asterisks like Tesseract. It also inserts places in the wrong places, as Tesseract does. So I ask myself if Microsoft uses Tesseract under the hood?

However, Microsoft OCR has an advantage over Tesseract: image preprocessing is much better. It doesn't matter if you have red text on a yellow background or white text on black. This is a trick for Tesseract that needs a good quality black and white image as input.

For both OCR libraries, the following apply: If you have recognition problems, try enhancing the image. Even blurring an image can be very good because it eliminates image noise.

Recognize Micr Font with OCR Engine?

More articles: