Java creates PDF or image from invalid HTML

I want to create a PDF (or any of the image formats [ png, jpg, bmp ..]) from the "invalid" HTML using Java . I searched for it and found a tool: iText and some of the tutorials in which iText creates a PDF file from HTML with Java.

In this (X) HTML lesson in PDF with Java , it works correctly for valid HTML, and I have a PDF file like this . But if I tried to create a PDF from my HTML, I had some errors.

First of all, my HTML is not very well formed and, unfortunately, cannot change. I downloaded it in here and the W3C validator found 28 errors .

My parameters:

  • First clean and return my HTML and create a PDF soon.
  • Find another tool (that works for my problem).
  • Your suggestion (using Java).
  • Last option; use a different platform ( .net, Php, Python , etc.) and using Webservices from my application.

Please help me in this matter. Thank you in advance

+4
source share
3 answers

Try wkhtmltopdf . This uses the Headless browser (webkit) to display the html first and then generate the PDF file. I used this in one of my java projects and worked well.

It provides some flexible command line options. Here is a link to a list of options and their use. It works for htmls, which are also not well formed.

+2
source

You can use a tool like http://jtidy.sourceforge.net/ to fix the HTML for you and run iText to output jTidy ...

+4
source

You can use an HTML parser that supports broken HTML, such as jsoup .

Like jtidy, it can automatically create valid HTML, but it also allows you to manipulate the HTML DOM so that you can fix the biggest problems yourself, as you want.

+2
source

Source: https://habr.com/ru/post/1415943/


All Articles