From the tesseract v3.03 release notes, tesseract now supports finding a PDF file with searchable text, but I donβt know how to use this function in my code.
I am currently using tess-two for my Android app, then I'm just wondering if this feature can work for Android?
It would be great if you could give an example that uses the tesseract api to render pdf, and then I will try to pass the missing functions to the tess-two library.
Thanks in advance.
P / s: I see a pdfrenderer file that can handle the output of a pdf file, but I donβt know how to apply it with the base api.
Update : here is my attempt:
tesseract::TessResultRenderer* renderer = new tesseract::TessPDFRenderer(nat->api.GetDatapath()); __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "data path = %s", nat->api.GetDatapath()); if (!nat->api.ProcessPages(c_file_name, NULL, 0, renderer)) { __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "process page failed"); delete renderer; return; } FILE* fout = fopen(c_pdf_file_name, "wb"); if (fout == NULL) { __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot create output file %s\n", c_pdf_file_name); delete renderer; return; } const char* data; int dataLength; bool boolValue = renderer->GetOutput(&data, &dataLength); if (boolValue) { fwrite(data, 1, dataLength, fout); if (fout != stdout) fclose(fout); else clearerr(fout); }else{ __android_log_print(ANDROID_LOG_ERROR, "Test_tesseract", "Cannot get output file"); } delete renderer;
My code does not work with the ProcessPages method. After writing a log (I have a problem with debugging in ndk), I found that the pdfrender BeginDocument always returns false in the TessBaseAPI::ProcessPages method of baseapi.cpp :
if (renderer && !renderer->BeginDocument(kUnknownTitle)) { success = false; }
Am I missing something?
P / s: I use tess-two , which prefer baseapi - capi