How to get character position in pytesseract

I am trying to get character position of images using pytesseract library.

import pytesseract from PIL import Image print pytesseract.image_to_string(Image.open('5.png')) 

Is there any library to get each character position

+6
source share
2 answers

Using pytesseract does not seem to be the best idea to have a position, but you can do it:

 from pytesseract import pytesseract pytesseract.run_tesseract('image.png', 'output', lang=None, boxes=False, config="hocr") 
0
source

The position of the symbol can be found as follows.

 import csv import cv2 from pytesseract import pytesseract as pt pt.run_tesseract('bw.png', 'output', lang=None, boxes=True, config="hocr") # To read the coordinates boxes = [] with open('output.box', 'rb') as f: reader = csv.reader(f, delimiter = ' ') for row in reader: if(len(row)==6): boxes.append(row) # Draw the bounding box img = cv2.imread('bw.png') h, w, _ = img.shape for b in boxes: img = cv2.rectangle(img,(int(b[1]),h-int(b[2])),(int(b[3]),h-int(b[4])),(255,0,0),2) cv2.imshow('output',img) 

When using this method, you can skip some texts. For best results, preliminary processing (background subtraction) of the image is required.

0
source

All Articles