How to improve the speed and accuracy of recognition Tesseract / Tessnet2?

Question

How to improve the speed and accuracy of recognition Tesseract / Tessnet2?

I saw that to limit scanning errors, you can define a whitelist for characters.

But I could not find the information for bool numericMode in ocr.Init(@"c:\temp", "fra", false);

Suppose you only want to scan numbers: Setting the whitelist to “0123456789” would be correct in order to get the best recognition results, but what does the numericMode parameter of the Init method do? I always considered this false even when the whitelist was “0123456789”.

Also, what are the best Bitmap (pixelformat) options for the image you want to transfer to tessnet.

+4

performance ocr tesseract tessnet2

Relok Sep 14 '11 at 12:10

source share

2 answers

Jerry · Answer 1 · 2011-09-29T07:54:13+0000

The question of scan numbers is listed in the Tesseract FAQ . If you have version 3, you should simply run the command:

 tesseract image.tif outputbase nobatch digits

Thinkable · Answer 2 · 2012-07-30T23:47:29+0000

From experience, the numerical mode limits the results to numbers and auxiliary characters. I saw "0123456789 ,. + - / *% <> $ () {}" and much more. Currency symbols are allowed.

In addition, in my experience, I have not seen much benefit from reduced bit depth formats over a full color image. However, I did not optimize the speed, but only the accuracy. If your fonts are small (lowercase> = 8 pixels high), then enlarging the image can really improve accuracy.

How to improve the speed and accuracy of recognition Tesseract / Tessnet2?

More articles: