ITextSharp HTML to PDF?

I would like to know if ITextSharp has the ability to convert HTML to PDF. Everything I convert will be plain text, but unfortunately the ITextSharp documentation is very small, so I can’t determine if this will be a viable solution for me.

If he can't do this, can someone point me to some good, free .net libraries that can take a simple HTML text document and convert it to pdf?

TIA.

+61
itextsharp html-to-pdf
May 12, '10 at 21:22
source share
8 answers

after some digging, I found a good way to accomplish what I need with ITextSharp.

Here is a sample code if it helps anyone else in the future:

protected void Page_Load(object sender, EventArgs e) { Document document = new Document(); try { PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create)); document.Open(); WebClient wc = new WebClient(); string htmlText = wc.DownloadString("http://localhost:59500/my.html"); Response.Write(htmlText); List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null); for (int k = 0; k < htmlarraylist.Count; k++) { document.Add((IElement)htmlarraylist[k]); } document.Close(); } catch { } } 
+28
May 12 '10 at 10:32 a.m.
source share

I came across the same question a few weeks ago, and this is the result of what I found. This method does a quick dump of HTML to PDF. The document most likely needs some formatting.

 private MemoryStream createPDF(string html) { MemoryStream msOutput = new MemoryStream(); TextReader reader = new StringReader(html); // step 1: creation of a document-object Document document = new Document(PageSize.A4, 30, 30, 30, 30); // step 2: // we create a writer that listens to the document // and directs a XML-stream to a file PdfWriter writer = PdfWriter.GetInstance(document, msOutput); // step 3: we create a worker parse the document HTMLWorker worker = new HTMLWorker(document); // step 4: we open document and start the worker on the document document.Open(); worker.StartDocument(); // step 5: parse the html into the document worker.Parse(reader); // step 6: close the document and the worker worker.EndDocument(); worker.Close(); document.Close(); return msOutput; } 
+65
May 14, '10 at 15:16
source share

Here is what I was able to get in version 5.4.2 (from installing nuget) to return pdf response from asp.net mpc controller. The use of FileStream instead of MemoryStream for output can be changed, if necessary.

I post it here because it is a complete example of the current use of iTextSharp for converting html -> pdf (excluding images, I did not look at this since my use does not require it)

It uses iTextSharp XmlWorkerHelper, so the incoming hmtl must be valid XHTML, so you may need some correction depending on your input.

 using iTextSharp.text.pdf; using iTextSharp.tool.xml; using System.IO; using System.Web.Mvc; namespace Sample.Web.Controllers { public class PdfConverterController : Controller { [ValidateInput(false)] [HttpPost] public ActionResult HtmlToPdf(string html) { html = @"<?xml version=""1.0"" encoding=""UTF-8""?> <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd""> <html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"" lang=""en""> <head> <title>Minimal XHTML 1.0 Document with W3C DTD</title> </head> <body> " + html + "</body></html>"; var bytes = System.Text.Encoding.UTF8.GetBytes(html); using (var input = new MemoryStream(bytes)) { var output = new MemoryStream(); // this MemoryStream is closed by FileStreamResult var document = new iTextSharp.text.Document(iTextSharp.text.PageSize.LETTER, 50, 50, 50, 50); var writer = PdfWriter.GetInstance(document, output); writer.CloseStream = false; document.Open(); var xmlWorker = XMLWorkerHelper.GetInstance(); xmlWorker.ParseXHtml(writer, document, input, null); document.Close(); output.Position = 0; return new FileStreamResult(output, "application/pdf"); } } } } 
+11
Jul 11 '13 at 19:39
source share

If I had a reputation, I would answer, if I had a reputation - I just implemented the asp.net HTML solution for PDF using Pechkin. The results are wonderful.

There is a nuget package for Pechkin, but as mentioned above, his blog mentions ( http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/ - I hope she does not mind that I resold it), there was a memory leak in this thread:

https://github.com/tuespetre/Pechkin

The above blog contains specific instructions for including this package (this is a 32-bit dll and requires .net4). here is my code. Incoming HTML is actually compiled through the HTML flexibility package (I automate generation of invoices):

 public static byte[] PechkinPdf(string html) { //Transform the HTML into PDF var pechkin = Factory.Create(new GlobalConfig()); var pdf = pechkin.Convert(new ObjectConfig() .SetLoadImages(true).SetZoomFactor(1.5) .SetPrintBackground(true) .SetScreenMediaType(true) .SetCreateExternalLinks(true), html); //Return the PDF file return pdf; } 

again, thanks mightymada - your answer is fantastic.

+10
Jan 07 '14 at 18:40
source share

I prefer to use another library called Pechkin because it is capable of converting non-trivial HTML (which also has CSS classes). This is possible because this library uses the WebKit linking engine, which is also used by browsers such as Chrome and Safari.

In my blog, I spoke in detail about my experience with Pechkin: http://codeutil.wordpress.com/2013/09/16/convert-html-to-pdf/

+6
Nov 27 '13 at 9:54 on
source share

The above code will certainly help convert HTML to PDF, but will fail if the HTML code has IMG tags with relative paths. The iTextSharp library does not automatically convert relative paths to absolute paths.

I tried the above code and added the code to take care of the IMG tags.

Here you can find the code for reference: http://www.am22tech.com/html-to-pdf/

+3
Oct 11 '11 at 18:13
source share

It has the ability to convert HTML file to pdf.

Required namespace for conversions:

 using iTextSharp.text; using iTextSharp.text.pdf; 

and for the conversion and download file:

 // Create a byte array that will eventually hold our final PDF Byte[] bytes; // Boilerplate iTextSharp setup here // Create a stream that we can write to, in this case a MemoryStream using (var ms = new MemoryStream()) { // Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF using (var doc = new Document()) { // Create a writer that bound to our PDF abstraction and our stream using (var writer = PdfWriter.GetInstance(doc, ms)) { // Open the document for writing doc.Open(); string finalHtml = string.Empty; // Read your html by database or file here and store it into finalHtml eg a string // XMLWorker also reads from a TextReader and not directly from a string using (var srHtml = new StringReader(finalHtml)) { // Parse the HTML iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml); } doc.Close(); } } // After all of the PDF "stuff" above is done and closed but **before** we // close the MemoryStream, grab all of the active bytes from the stream bytes = ms.ToArray(); } // Clear the response Response.Clear(); MemoryStream mstream = new MemoryStream(bytes); // Define response content type Response.ContentType = "application/pdf"; // Give the name of file of pdf and add in to header Response.AddHeader("content-disposition", "attachment;filename=invoice.pdf"); Response.Buffer = true; mstream.WriteTo(Response.OutputStream); Response.End(); 
+3
May 31 '15 at 8:20
source share

If you convert html to pdf on the html server side, you can use Rotativa:

 Install-Package Rotativa 

This is based on wkhtmltopdf, but has better css support than iTextSharp, and is very easy to integrate with MVC (which is mostly used), since you can simply return the view in pdf format:

 public ActionResult GetPdf() { //... return new ViewAsPdf(model);// and you are done! } 
+1
Aug 10 '16 at 15:47
source share



All Articles