Tesseract - change a language file

I am creating an AIR project that will need some OCR features, so I decided to use tesseract (now I'm trying to get it working on Windows).

My problem is that I cannot change the location of the language file - it always tries to look in my Tesseract installation directory (program files (x86) \ Tesseract-OCR \ tessdata \ mylang.traineddata)

Is there a way I can configure Tesseract to search for this file where I point? for example, in the same folder as tesseract.exe. I do not want (or, perhaps, the event cannot) install the application with the AIR installer. I tried it with version 3.0 and the latest version of SVN.

thanks

+7
source share
3 answers

I solved the problem by rewriting the source code of Tesseract (im using SVN 597). As nguyenq said, Tesseract is trying to find data on the path given by the TESSDATA_PREFIX environment variable. If this is not found, then this makes some tricks that I do not understand :). Therefore, if someone needs a portable version of Tesseract (which does not depend on the installation of Tesseract), edit mainblk.cpp around line 60, this is my version:

// remove the stuff that Tesseract does to find the installation path /* if (!getenv("TESSDATA_PREFIX")) { #ifdef TESSDATA_PREFIX #define _STR(a) #a #define _XSTR(a) _STR(a) datadir = _XSTR(TESSDATA_PREFIX); #undef _XSTR #undef _STR #else if (argv0 != NULL) { if (getpath(argv0, dll_module_name, datadir) < 0) #ifdef __UNIX__ CANTOPENFILE.error("main", ABORT, "%s to get path", argv0); #else NO_PATH.error("main", DBG, NULL); #endif } else { datadir = "./"; } #endif } else { datadir = getenv("TESSDATA_PREFIX"); }*/ datadir = "./"; // look for config things in the same folder as the executable. 

Now you can pack things in the folder "tesseract executable location" \ tessdata p>

+2
source

Yes, you can by setting the TESSDATA_PREFIX environment variable, for example:

export TESSDATA_PREFIX = / usr / local / share /

Please note that the directory path must end with /.

+10
source

I suggest you not handle the tessdata path with TESSDATA_PREFIX . you can define the tessdata path in init tesseract. If you use tesseract.exe on the command line, use the following syntax:

 tesseract.exe --tessdata-dir tessdataPath image.png output -l eng 

if you use tesseract :: TessBaseApi , in api.init () init:

 api->Init(tessdataPath, language) //api->Init("C:", "eng") 
0
source

All Articles