I was hoping someone would tell me exactly why my Tesseract has problems recognizing some images with numbers, and if there is anything I can do about it. Everything works according to the test, and since these are just the numbers that I need, I thought I could handle the English pattern until I had to start with a 7-segment display.
Although I have a lot of problems with the added images, I would like to know if I should start working on my own recognition algorithms, or if I could make my own datasets for Tesseract, and then it works, does anyone know where the limitation is lies with Tesseract?
everyone tried: tried to install psm on one_line, one_word, one_char (and cut the picture). With one_line and one_word, no significant changes have occurred. with one_char, he recognized a little better, but sometimes because of the long distance, he attached an additional number to it, which then screwed it if you look at the attached image
and then it turned out 04. I also tried to do binarization myself, this led to weaker recognition and was very resource intensive. I tried to invert the images, it does not matter for tesseract.
I attached the photographs that I need, in particular, for processing.
Image Explanation:
- This is an image that tesseract does not cause difficulties, although it was done in words for the convenience of creating an application around the working image.
is the image of real life corresponding to the image_seven. But he cannot recognize it.
- this is another image that I would like to recognize it, and yes, I know that it cannot be hidden, and I made unskrew (I think that the scraper is the term here = "edit") when it is being tested.
android ocr tesseract
Anders metnik
source share