I know this is a very late answer, but I had the same problem when working with OS X Lion, python2.7, OpenCV 2.4 and Tesseract 3.0
I found three problems when debugging using eclipse + pydev ...
First, the image given by tesseract should always be a binary image.
The second problem: cv2.imread() does not work sometimes, and it just returns an empty ndarray. I really donβt know why this is happening, and if it is related to running python in eclipse + pydev.
So when I tried to use empty ndarray shape and dtype , I had null characters, and when using cv.SetData() this created an empty iplimage. And when api.GetUTF8Text() tried to work on this empty iplimage, everything broke in a weird way, and I got "Segmentation Error 11"
Thirdly, it turns out that cv2 and cv have a completely different way of handling arrays, and they reset their axes. So, if you did something like ...
image = cv2.imread('something.jpg',0) # flag = 0 is for converting to grayscale image = cv2.threshold(image,128,255,cv2.THRESH_BINARY) height,width,channel = image.shape
then you need to do ...
iplimage = cv.CreateImageHeader((width,height), cv.IPL_DEPTH_8U, 1) cv.SetData(iplimage, image.tostring(),image.dtype.itemsize * (width))
and I see that you have solved the third problem.
It's strange if I have a code like img0 = cv.fromarray(scr) right after the line scr = cv2.imread('textSample.jpg',0) , then everything works fine.
Go through the following code with the image "textSample.jpg" (at the bottom of this post) and uncomment the lines (except the first line, which is not a comment), so you can see which image your codes are working on. scr , img0 and img1 should ultimately be the same:
#!/usr/bin/env python import cv2 import cv2.cv as cv import tesseract scr = cv2.imread('textSample.jpg',0) #img0 = cv.fromarray(scr) #cv.SaveImage('img0.jpg',img0) api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetPageSegMode(tesseract.PSM_AUTO) image = cv.CreateImageHeader((scr.shape[1],scr.shape[0]), cv.IPL_DEPTH_8U, 1) cv.SetData(image, scr.tostring(), scr.dtype.itemsize*scr.shape[1]) #cv.SaveImage('img1.jpg',image) tesseract.SetCvImage(image,api) text=api.GetUTF8Text() conf=api.MeanTextConf() print text print conf
you should get something like ...
OE3456789 !"#$%&'()* .-./ 76
