How to get character position in pytesseract

Question

How to get character position in pytesseract

I am trying to get character position of images using pytesseract library.

import pytesseract from PIL import Image print pytesseract.image_to_string(Image.open('5.png'))

Is there any library to get each character position

+6

python-2.7 image-processing ocr python-tesseract pytesser

Chandy alex Aug 24 '15 at 5:31

source share

2 answers

el josso · Answer 1 · 2017-06-06T13:54:27+0000

Using pytesseract does not seem to be the best idea to have a position, but you can do it:

 from pytesseract import pytesseract pytesseract.run_tesseract('image.png', 'output', lang=None, boxes=False, config="hocr")

khushhall · Answer 2 · 2017-07-13T11:41:53+0000

The position of the symbol can be found as follows.

 import csv import cv2 from pytesseract import pytesseract as pt pt.run_tesseract('bw.png', 'output', lang=None, boxes=True, config="hocr") # To read the coordinates boxes = [] with open('output.box', 'rb') as f: reader = csv.reader(f, delimiter = ' ') for row in reader: if(len(row)==6): boxes.append(row) # Draw the bounding box img = cv2.imread('bw.png') h, w, _ = img.shape for b in boxes: img = cv2.rectangle(img,(int(b[1]),h-int(b[2])),(int(b[3]),h-int(b[4])),(255,0,0),2) cv2.imshow('output',img)

When using this method, you can skip some texts. For best results, preliminary processing (background subtraction) of the image is required.

How to get character position in pytesseract

More articles: