Tesseract-OCR recognition accuracy and speed (3.02)

Question

Tesseract-OCR recognition accuracy and speed (3.02)

I have a group of very small images (w: 70-100; h: 12-20) as shown below:

These images are nothing but the nickname of a group member. I want to read text from simple images, all of them have one background, only different names. So what I did with this image:

I use the code below to get text from the second image:

tesseract::TessBaseAPI ocr; ocr.Init(NULL, "eng"); PIX* pix = pixRead("D:\\image.png"); ocr.SetImage(pix); std::string result = ocr.GetUTF8Text();

I have 2 problems with this:

ocr.GetUTF8Text(); works slowly: 650-750 ms. The image is small, why does it work for so long?
From the image above I get a result like: "iwillkillsm", "iwillkillsel", etc. This image is simple, and I believe that tesseract gurus are able to recognize it with 100% accuracy.
What should I do with the image / code or what should I read (and where) about tesseract-ocr (something about text speed and quality recognition) to solve these problems?

+5

image tesseract

Anton Kasabutski Jul 02 '16 at 5:49

source share

1 answer

nlloyd · Accepted Answer · 2016-07-02T06:25:43+0000

This may seem strange, but I was always lucky with tesseract when I enlarged the image. The image will look “worse” for me, but tesseract went faster and had much better accuracy.

There is a limit to how big you can take images before starting to get worse results, but :) I think I remember shooting 600 pixels in the past. You will have to play with him, though.

Tesseract-OCR recognition accuracy and speed (3.02)

More articles: