Is there any java library to convert a document from pdf to html?

Question

Is there any java library to convert a document from pdf to html?

An open source implementation will be preferred.

+6

java html pdf

broundee Dec 11 '08 at 10:49

source share

3 answers

Only those that I know of should be paid.

Bfo
Jpedal

+1

Kablam Dec 11 '08 at 11:08

source share

Try using the PDFBox from the apache database.

+1

dacracot Nov 04 '14 at 23:03

source share

Philho · Accepted Answer · 2008-12-11T12:59:35+0000

Obviously, this is not an easy task, PDF formatting is much richer than HTML code (plus you have to extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial ...).
I see a similar question in the sidebar of your question: Converting PDF to HTML with Python , which points to a library (poppler, which is apparently written in C ++, maybe it can be accessed with JNI / JNA) and related a question that offers even more answers.

Is there any java library to convert a document from pdf to html?

More articles: