Parse Pdf File and write content to text file using java

How to parse a pdf file and write content to a text file using Java?

+7
java ms-word pdf
source share
4 answers

To parse a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/

To read / write Word (or other Office) file formats in Java, try the POI: http://poi.apache.org/

Both are free.

+9
source share

Try iText java library:

iText is an ideal library for developers who want to improve web and other applications through dynamic creation of PDF documents and / or manipulation .

It can be used for your parsing.

Regarding the generation of text documents, the OpenOffice Java API can create documents compatible with Word (without personal experience with this API).

+5
source share

You can try any of them:

Once you read the contents of a PDF file, you can also save them in an ODT file or a text file. For an ODT file, try http://odftoolkit.openoffice.org .

Best!

+3
source share

You can use iText if the source PDF is mostly text. Images etc. Quite difficult to handle during parsing. If it's just text, it's as simple as 10 lines of code. See the iText Guide for examples.

For writing text files there is only Apache POI. This may be a little difficult to understand, but for such a simple task this should not be a problem.

0
source share

All Articles