Is there a way to extract text from PostScript files (.ps, .eps) using Java?

I am looking for a solution similar to PDFBox for Apache Tika PDF files, however for PS files.

thanks.

+4
source share
2 answers

As James Black says, it's probably best to convert to PDF and use your familiar tools.

However, there is pstotext , which is available, for example, in the Ubuntu universe in its own package.

Ghostscript itself also comes with ps2txt and ps2ascii, which can also do this.

+1
source

You can use Ghostscript to convert to pdf, http://www.osalt.com/ghostscript , then there are various libraries for processing pdf.

This has the advantage that you simply extract PDF files, so you can process other formats while you can convert them to PDF files.

+1
source

All Articles