API for reading text from an image file using OCR

I am looking for a sample code or API name from OCR (Optical Character Recognition) in Java, using which I can extract all the text from an image file. Not comparing it to any image I am making using the code below.

public class OCRTest { static String STR = ""; public static void main(String[] args) { OCR l = new OCR(0.70f); l.loadFontsDirectory(OCRTest.class, new File("fonts")); l.loadFont(OCRTest.class, new File("fonts", "font_1")); ImageBinaryGrey i = new ImageBinaryGrey(Capture.load(OCRTest.class, "full.png")); STR = l.recognize(i, 1285, 654, 1343, 677, "font_1"); System.out.println(STR); } } 
+7
java ocr
source share
3 answers

You can try javaocr at sourceforge: http://javaocr.sourceforge.net/

There is also a great example with an applet that uses Encog: http://www.heatonresearch.com/articles/42/page1.html

However, OCR requires a lot of power, so this means that if you are looking for heavy use, you have to keep an eye on the OCR libraries written in C and integrate them with Java.

OCR is hard. Therefore, do not forget to qualify your needs before you come to your senses.

Tesseract and opencv (with javacv for integration, for example) are common choices. There are also commercial solutions such as ABBYY FineReader Engine and ABBYY Cloud OCR SDK .

+8
source share

You can try Tess4j or JavaCPP Presets for Tesseract . I transmit later than its easier than the first. Add dependency to your pom `

  <dependency> <groupId>org.bytedeco.javacpp-presets</groupId> <artifactId>tesseract-platform</artifactId> <version>3.04.01-1.3</version> </dependency> 

`And its simple code

 import org.bytedeco.javacpp.*; import static org.bytedeco.javacpp.lept.*; import static org.bytedeco.javacpp.tesseract.*; public class BasicExample { public static void main(String[] args) { BytePointer outText; TessBaseAPI api = new TessBaseAPI(); // Initialize tesseract-ocr with English, without specifying tessdata path if (api.Init(null, "eng") != 0) { System.err.println("Could not initialize tesseract."); System.exit(1); } // Open input image with leptonica library PIX image = pixRead(args.length > 0 ? args[0] : "/usr/src/tesseract/testing/phototest.tif"); api.SetImage(image); // Get OCR result outText = api.GetUTF8Text(); System.out.println("OCR output:\n" + outText.getString()); // Destroy used object and release memory api.End(); outText.deallocate(); pixDestroy(image); } } 

Tess4j is a little complicated because it requires a special redistributable package VC ++.

+4
source share

The open source OCR engine is available on Google for OCR. It can be processed using CMD. You can easily handle CMD with java for web applications.
Please visit https://www.youtube.com/watch?v=Mjg4yyuuqr5E , you will receive step-by-step details for OCR processing using CMD.

+2
source share

All Articles