Is there any java library to convert a document from pdf to html?

An open source implementation will be preferred.

+6
java html pdf
source share
3 answers

Obviously, this is not an easy task, PDF formatting is much richer than HTML code (plus you have to extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial ...).
I see a similar question in the sidebar of your question: Converting PDF to HTML with Python , which points to a library (poppler, which is apparently written in C ++, maybe it can be accessed with JNI / JNA) and related a question that offers even more answers.

+2
source share

Only those that I know of should be paid.

Bfo
Jpedal

+1
source share

Try using the PDFBox from the apache database.

+1
source share

All Articles