Question 2 somehow answered here: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images No need to train with multiple sizes. 10 point I will do. (The exception to this is very small text. If you want to recognize text with an x height of less than about 15 pixels, you should either train it specific or scale the images before trying to recognize them.)
Questions 1 and 3: from experience, I have successfully used fonts with a resolution of 300 dpi / without anti-aliasing. In particular, I used the following conversion options on the training pdf, which created a satisfactory image:
convert -density 300 -depth 8 [input].pdf -background white -flatten +matte -compress none -monochrome [output].tif
But then I tried to add a font to Tesseract, and it correctly recognized the characters when I used an image with a resolution of 150 dpi. So, I don’t think there is a general solution, it depends on the type of fonts you are trying to add.
source share