My suggestion: use the Ghostscript command line. Since ImageMagick uses Ghostscript anyway, in the background (IM technical term for this: Ghostscript is the "delegate" for some transformations like PDF-> TIFF).
Here is the command line that should work well for letter-format pages in a multi-page PDF file:
gswin32c.exe ^ -o page_%03d.tif ^ -sDEVICE=tiffg4 ^ -r720x720 ^ -g6120x7920 ^ input.pdf
The -g... parameter controls the absolute width + height of the output pages using the "device points" ... (and with 6120x7920 at 720dpi this happens as the size of letters).
These TIFF Pages ...
- ... will be black + white,
- ... will have a resolution of 720 dpi,
- ... G4 will be compressed and
- ... will be much smaller than your compressed 300dpi from the IM command line
Your IM -depth 8 parameter is not suitable for getting good results from the pov of the later OCR, as it will create shades of gray around letters that do not help with this.
Your OCR results will now be much better than before.
If your OCR cannot handle the TIFF G4 format (which I doubt), you can generate other TIFF subformats using Ghostscript. For instance:
gswin32c.exe ^ -o page_%03d.tif ^ -sDEVICE=tiffgray ^ -r720x720 ^ -g6120x7920 ^ -sCompression=lzw ^ input.pdf
.
gswin32c.exe ^ -o page_%03d.tif ^ -sDEVICE=tiff24nc ^ -r720x720 ^ -g6120x7920 ^ -sCompression=lzw ^ input.pdf
The tiffgray device generates 8-bit gray output. The tiff24nc device creates an 8-bit RGB color output. Both types of TIFFs, of course, will be larger than the tiffg4 output.
source share