I need to call Tesseract OCR (its open source C ++ library that does optical character recognition) from a Java application server. Currently, it is fairly easy to run an executable using Runtime.exec (). Basic logic would be
- Save the image that is currently stored in memory for the file (a.tif)
- pass the image file name to the tesseract command-line program.
- read in a text output file with Java using FileReader.
How much performance improvement can I get by writing a JNI wrapper for Tesseract? Unfortunately, there is no open source JNI shell that runs on Linux. I would have to do it myself, and I wonder if it’s worth the development cost.
source
share