i executed the following code to generate significant confidence values:
int main(int argc, char **argv) { const char *lang="eng"; const PIX *pixs; if ((pixs = pixRead(argv[1])) == NULL) { cout <<"Unsupported image type"<<endl; exit(3); } TessBaseAPI api; api.SetVariable("save_blob_choices", "T"); api.SetPageSegMode(tesseract::PSM_SINGLE_WORD ); api.SetImage(pixs); int rc = api.Init(argv[0], lang); api.Recognize(NULL); ResultIterator* ri = api.GetIterator(); if(ri != 0) { do { const char* symbol = ri->GetUTF8Text(RIL_SYMBOL); if(symbol != 0) { float conf = ri->Confidence(RIL_SYMBOL); cout<<"\nnext symbol: "<< symbol << " confidence: " << conf <<"\n" <<endl; } delete[] symbol; } while((ri->Next(RIL_SYMBOL))); } return 0; }
image link
the result obtained for the above image:
next character: N certainty: 72.3563 next character: B trust: 72.3563
next character: E certainty: 69.9937 next character: T certainty: 69.9937
next symbol: R certainty: 69.9937 next symbol: certainty: 69.9937
next character: N certainty: 69.9937 next character: G certainty: 69.9937
next character: - confidence: 69.9937 next character: I'm sure: 69.9937
As you can see, the confidence values ββfor the characters of the same word are the same. Is this the expected result? Shouldn't the trust values ββfor each character be different? I tried to execute code for a word in which each character was in a different font style. Nevertheless, the confidence value was the same for characters belonging to one word.
Ekta
source share