How to improve the speed and accuracy of recognition Tesseract / Tessnet2?

I saw that to limit scanning errors, you can define a whitelist for characters.

But I could not find the information for bool numericMode in ocr.Init(@"c:\temp", "fra", false);

Suppose you only want to scan numbers: Setting the whitelist to โ€œ0123456789โ€ would be correct in order to get the best recognition results, but what does the numericMode parameter of the Init method do? I always considered this false even when the whitelist was โ€œ0123456789โ€.

Also, what are the best Bitmap (pixelformat) options for the image you want to transfer to tessnet.

+4
source share
2 answers

The question of scan numbers is listed in the Tesseract FAQ . If you have version 3, you should simply run the command:

 tesseract image.tif outputbase nobatch digits 
+1
source

From experience, the numerical mode limits the results to numbers and auxiliary characters. I saw "0123456789 ,. + - / *% <> $ () {}" and much more. Currency symbols are allowed.

In addition, in my experience, I have not seen much benefit from reduced bit depth formats over a full color image. However, I did not optimize the speed, but only the accuracy. If your fonts are small (lowercase> = 8 pixels high), then enlarging the image can really improve accuracy.

+1
source

All Articles