How to convert HTML to PDF using iTextSharp

I want to convert below HTML to PDF using iTextSharp, but don't know where to start:

<style> .headline{font-size:200%} </style> <p> This <em>is </em> <span class="headline" style="text-decoration: underline;">some</span> <strong>sample<em> text</em></strong> <span style="color: red;">!!!</span> </p> 
+53
c # pdf-generation itextsharp xmlworker
Aug 6 '14 at 15:23
source share
3 answers

Firstly, HTML and PDF are not related, although they were created at about the same time. HTML is designed to convey higher-level information, such as paragraphs and tables. Although there are ways to control it, the browser should ultimately use these higher-level concepts. PDF is designed to transfer documents and documents should “look” the same wherever they appear.

In an HTML document, you may have a paragraph that is 100% wide and depending on the width of your monitor, it may take 2 lines or 10 lines, and when it prints it may be 7 lines, and when you look at it on your phone 20 lines may be required. However, the PDF file must be independent of the rendering device, so regardless of screen size it should always display exactly the same.

Due to the musts above, PDF does not support abstract things such as “tables” or “paragraphs”. There are three main points that PDF supports: text, lines / shapes, and images. (There are other things, such as annotations and movies, but I try to make it simple here.) In PDF, you don't say "here is a paragraph, the browser does your thing!". Instead, you say: "Draw this text in this exact place X, Y, using this exact font, and don’t worry, I already calculated the width of the text, so I know that it will fit into this line." You also do not say “the table here”, but instead you say: “Draw this text in this exact place, and then draw a rectangle in this other exact place, which I have already calculated, so I know that it will be around the text.”

Secondly, iText and iTextSharp parse HTML and CSS. It. ASP.Net, MVC, Razor, Struts, Spring, etc. - all HTML frameworks, but iText / iTextSharp is 100% unaware of them. Same thing with DataGridViews, Repeaters, Templates, Views, etc., which are structure-specific abstractions. It is your responsibility to get the HTML from your choice of framework. IText will not help you. If you get an exception saying The document has no pages , or you think that “iText does not parse my HTML”, it’s almost certain that you don’t actually have HTML , you only think what you are doing.

Thirdly, the built-in class that has been around for many years is HTMLWorker , however it has been replaced by XMLWorker ( Java / .Net ). Zero work is done on HTMLWorker , which does not support CSS files and has limited support for the most basic CSS properties, but actually breaks down into specific tags . If you do not see the HTML attribute or CSS property and value in this file , then it is probably not supported by HTMLWorker . XMLWorker can sometimes be more complicated, but these complications also make extensible .

Below is the C # code that shows how to parse HTML tags in text abstractions that are automatically added to the document you are working on. C # and Java are very similar, so it’s relatively easy to convert them. Example # 1 uses embedded HTMLWorker to parse an HTML string. Since only inline styles are supported, class="headline" ignored, but everything else should work. Example # 2 is the same as the first, except that XMLWorker used XMLWorker . Example # 3 also parses a simple CSS example.

 //Create a byte array that will eventually hold our final PDF Byte[] bytes; //Boilerplate iTextSharp setup here //Create a stream that we can write to, in this case a MemoryStream using (var ms = new MemoryStream()) { //Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF using (var doc = new Document()) { //Create a writer that bound to our PDF abstraction and our stream using (var writer = PdfWriter.GetInstance(doc, ms)) { //Open the document for writing doc.Open(); //Our sample HTML and CSS var example_html = @"<p>This <em>is </em><span class=""headline"" style=""text-decoration: underline;"">some</span> <strong>sample <em> text</em></strong><span style=""color: red;"">!!!</span></p>"; var example_css = @".headline{font-size:200%}"; /************************************************** * Example #1 * * * * Use the built-in HTMLWorker to parse the HTML. * * Only inline CSS is supported. * * ************************************************/ //Create a new HTMLWorker bound to our document using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc)) { //HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses) using (var sr = new StringReader(example_html)) { //Parse the HTML htmlWorker.Parse(sr); } } /************************************************** * Example #2 * * * * Use the XMLWorker to parse the HTML. * * Only inline CSS and absolutely linked * * CSS is supported * * ************************************************/ //XMLWorker also reads from a TextReader and not directly from a string using (var srHtml = new StringReader(example_html)) { //Parse the HTML iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml); } /************************************************** * Example #3 * * * * Use the XMLWorker to parse HTML and CSS * * ************************************************/ //In order to read CSS as a string we need to switch to a different constructor //that takes Streams instead of TextReaders. //Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css))) { using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html))) { //Parse the HTML iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss); } } doc.Close(); } } //After all of the PDF "stuff" above is done and closed but **before** we //close the MemoryStream, grab all of the active bytes from the stream bytes = ms.ToArray(); } //Now we just need to do something with those bytes. //Here I'm writing them to disk but if you were in ASP.Net you might Response.BinaryWrite() them. //You could also write the bytes to a database in a varbinary() column (but please don't) or you //could pass them to another function for further PDF processing. var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf"); System.IO.File.WriteAllBytes(testFile, bytes); 



Update 2017

There is good news for HTML-to-PDF requirements. As this answer showed , the W3C css-break-3 standard will solve the problem ... This is a candidate’s recommendation with a plan to become the final recommendation this year after tests.

As a non-standard solution, there are solutions with plugins for C #, as shown by print-css.rocks .

+126
Aug 6 '14 at 15:23
source share

@ Chris Haas very well explained how to use itextSharp to convert HTML to PDF , very useful
my addition:
Using HtmlTextWriter , I put html tags inside the HTML table + inline CSS, I got my PDF file as I wanted, without using XMLWorker .
Edit : adding sample code:
ASPX page:

 <asp:Panel runat="server" ID="PendingOrdersPanel"> <!-- to be shown on PDF--> <table style="border-spacing: 0;border-collapse: collapse;width:100%;display:none;" > <tr><td><img src="abc.com/webimages/logo1.png" style="display: none;" width="230" /></td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla.</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:9px;color:#10466E;padding:0px;text-align:right;">blablabla</td></tr> <tr style="line-height:10px;height:10px;"><td style="display:none;font-size:11px;color:#10466E;padding:0px;text-align:center;"><i>blablabla</i> Pending orders report<br /></td></tr> </table> <asp:GridView runat="server" ID="PendingOrdersGV" RowStyle-Wrap="false" AllowPaging="true" PageSize="10" Width="100%" CssClass="Grid" AlternatingRowStyle-CssClass="alt" AutoGenerateColumns="false" PagerStyle-CssClass="pgr" HeaderStyle-ForeColor="White" PagerStyle-HorizontalAlign="Center" HeaderStyle-HorizontalAlign="Center" RowStyle-HorizontalAlign="Center" DataKeyNames="Document#" OnPageIndexChanging="PendingOrdersGV_PageIndexChanging" OnRowDataBound="PendingOrdersGV_RowDataBound" OnRowCommand="PendingOrdersGV_RowCommand"> <EmptyDataTemplate><div style="text-align:center;">no records found</div></EmptyDataTemplate> <Columns> <asp:ButtonField CommandName="PendingOrders_Details" DataTextField="Document#" HeaderText="Document #" SortExpression="Document#" ItemStyle-ForeColor="Black" ItemStyle-Font-Underline="true"/> <asp:BoundField DataField="Order#" HeaderText="order #" SortExpression="Order#"/> <asp:BoundField DataField="Order Date" HeaderText="Order Date" SortExpression="Order Date" DataFormatString="{0:d}"></asp:BoundField> <asp:BoundField DataField="Status" HeaderText="Status" SortExpression="Status"></asp:BoundField> <asp:BoundField DataField="Amount" HeaderText="Amount" SortExpression="Amount" DataFormatString="{0:C2}"></asp:BoundField> </Columns> </asp:GridView> </asp:Panel> 

C # code:

 protected void PendingOrdersPDF_Click(object sender, EventArgs e) { if (PendingOrdersGV.Rows.Count > 0) { //to allow paging=false & change style. PendingOrdersGV.HeaderStyle.ForeColor = System.Drawing.Color.Black; PendingOrdersGV.BorderColor = Color.Gray; PendingOrdersGV.Font.Name = "Tahoma"; PendingOrdersGV.DataSource = clsBP.get_PendingOrders(lbl_BP_Id.Text); PendingOrdersGV.AllowPaging = false; PendingOrdersGV.Columns[0].Visible = false; //export won't work if there a link in the gridview PendingOrdersGV.DataBind(); //to PDF code --Sam string attachment = "attachment; filename=report.pdf"; Response.ClearContent(); Response.AddHeader("content-disposition", attachment); Response.ContentType = "application/pdf"; StringWriter stw = new StringWriter(); HtmlTextWriter htextw = new HtmlTextWriter(stw); htextw.AddStyleAttribute("font-size", "8pt"); htextw.AddStyleAttribute("color", "Grey"); PendingOrdersPanel.RenderControl(htextw); //Name of the Panel Document document = new Document(); document = new Document(PageSize.A4, 5, 5, 15, 5); FontFactory.GetFont("Tahoma", 50, iTextSharp.text.BaseColor.BLUE); PdfWriter.GetInstance(document, Response.OutputStream); document.Open(); StringReader str = new StringReader(stw.ToString()); HTMLWorker htmlworker = new HTMLWorker(document); htmlworker.Parse(str); document.Close(); Response.Write(document); } } 

of course include iTextSharp Refrences in the cs file

 using iTextSharp.text; using iTextSharp.text.pdf; using iTextSharp.text.html.simpleparser; using iTextSharp.tool.xml; 

Hope this helps!
thank

+7
Mar 27 '15 at 13:43 on
source share

Here is the link I used as a guide. Hope this helps!

Convert HTML to PDF with ITextSharp

 protected void Page_Load(object sender, EventArgs e) { try { string strHtml = string.Empty; //HTML File path -http://aspnettutorialonline.blogspot.com/ string htmlFileName = Server.MapPath("~") + "\\files\\" + "ConvertHTMLToPDF.htm"; //pdf file path. -http://aspnettutorialonline.blogspot.com/ string pdfFileName = Request.PhysicalApplicationPath + "\\files\\" + "ConvertHTMLToPDF.pdf"; //reading html code from html file FileStream fsHTMLDocument = new FileStream(htmlFileName, FileMode.Open, FileAccess.Read); StreamReader srHTMLDocument = new StreamReader(fsHTMLDocument); strHtml = srHTMLDocument.ReadToEnd(); srHTMLDocument.Close(); strHtml = strHtml.Replace("\r\n", ""); strHtml = strHtml.Replace("\0", ""); CreatePDFFromHTMLFile(strHtml, pdfFileName); Response.Write("pdf creation successfully with password -http://aspnettutorialonline.blogspot.com/"); } catch (Exception ex) { Response.Write(ex.Message); } } public void CreatePDFFromHTMLFile(string HtmlStream, string FileName) { try { object TargetFile = FileName; string ModifiedFileName = string.Empty; string FinalFileName = string.Empty; /* To add a Password to PDF -http://aspnettutorialonline.blogspot.com/ */ TestPDF.HtmlToPdfBuilder builder = new TestPDF.HtmlToPdfBuilder(iTextSharp.text.PageSize.A4); TestPDF.HtmlPdfPage first = builder.AddPage(); first.AppendHtml(HtmlStream); byte[] file = builder.RenderPdf(); File.WriteAllBytes(TargetFile.ToString(), file); iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(TargetFile.ToString()); ModifiedFileName = TargetFile.ToString(); ModifiedFileName = ModifiedFileName.Insert(ModifiedFileName.Length - 4, "1"); string password = "password"; iTextSharp.text.pdf.PdfEncryptor.Encrypt(reader, new FileStream(ModifiedFileName, FileMode.Append), iTextSharp.text.pdf.PdfWriter.STRENGTH128BITS, password, "", iTextSharp.text.pdf.PdfWriter.AllowPrinting); //http://aspnettutorialonline.blogspot.com/ reader.Close(); if (File.Exists(TargetFile.ToString())) File.Delete(TargetFile.ToString()); FinalFileName = ModifiedFileName.Remove(ModifiedFileName.Length - 5, 1); File.Copy(ModifiedFileName, FinalFileName); if (File.Exists(ModifiedFileName)) File.Delete(ModifiedFileName); } catch (Exception ex) { throw ex; } } 

You can download a sample file. Just put the html you want to convert to the files folder and run it. It will automatically generate a pdf file and put it in the same folder. But in your case, you can specify your html path in the htmlFileName variable.

-3
Sep 21 '16 at 5:15
source share



All Articles