If you have time to develop the discovery yourself, I would do something like this:
- Get 1000 images or so and either OCR them yourself, or let people on Amazon Mechanical Turk do it for you, it will be practically nothing. Now you have something to set up your algorithm and measure how well you do it.
- As Ryan wrote, play with standard image filters, contrast, color, gauss, etc. manually or with something like http://www.roborealm.com/ . See if you can find a combination that makes the text really stand out.
- Try the libraries again
- If libs still does not work, try using your knowledge of the painting to separate it into separate digits. You know how many digits there should be and how many pixels there will be. Use edge detection or something like that (maybe standard OCR extraction along with clustering will give you each digit as a cluster?) To find the digits and separate them separately.
- Perform standard extraction of the OCR function (not too creative - use existing libraries or at least read that most often and simply) on each digit, and pass these functions along with the answer that you received under 1) to a neural network or SVM .
- Improve your feature set until your computer starts up.
Since you only have ten digits that are reasonably consistent between the images, this should work.
source share