Convert document formats in C #

What is the best way to convert between HTML, XML and XSL-FO in C #?

I already have HTML (comes with FCKEditor) and I would like to print the PDF (I have XSL-> PDF Converter). I just can't find a library that will convert from HTML to any XSL interface.

0
html c # xml xsl-fo
source share
4 answers

After a year or two, I had to create pdf files from the C ++ / C # program. In the end, I decided to run Apache Java FOP as a separate conversion process. Experience with xsl-fo was not pleasant. At that time there was not a single tool fully implemented by xsl-fo. Tools tended to pick a subset of specs and hack. Given the growing complexity of xsl-fo, I'm starting to wonder if there will ever be a full implementation.

FOP was usually a mistake, and considerable time was spent solving problems. XSLT and XPaths were hard to master. A few weeks passed before I saw the verbosity and was able to quickly do everything. I do not think I have ever had a head around xsl-fo. This makes the html and css model look like a toy for children. Fortunately, PDF files generate and do not have too many problems. :-)

In any case, the task: creating PDF files from xhtml output from FCKEditor.

I just can't find a library that will convert from HTML to any XSL interface.

Heh. Yes, this is because there is no one and probably there will not be html for the xsl-fo converter, which is not good. Such a converter has several things against this: browser complexity and xsl-fo complexity. For such a converter to deal with an average html document, it needs the guts of a web browser: layout, CSS support, possibly even JavaScript. Then he should take the displayed page and find out which xsl-fo is needed to get something similar, and fits into the xsl-fo page limits.

This seems like a problem with creating a word viewer: without reusing a large number of words, it takes up most of the time because it does not look the same.

So ... what can you do? Well, with a small subset of html to work with, this is a good start. We hope that the exit from FCKEditor is xhtml, since getting html in xml is a world of pain in itself (which can be useful tidy ), then, if some poor soul hadn't done FCKEditor xhtml β†’ xsl-fo xslt for of your xsl-fo implementation, you have to create it. This includes training xsl-fo, xslt and xpath. In my experience, this will take several weeks and it will be a powerful solution.

To get started with xsl-fo, I found the following useful links:

So what is all this xsl-fo, xslt and all that? XSL-FO: ready for prime time? expressed as:

Extensible Family of Style (XSL) Family of Languages ​​XSL is a family of guidelines for defining the transformation and presentation of XML documents. It consists of three parts:

  • XSL Transformations (XSLT), an XML transformation language
  • XML Path Language (XPath), an expression language used by XSLT to access or reference parts of an XML document. (XPath is also used by the XML Linking specification).
  • XSL Formatting Objects (XSL-FO), an XML dictionary for defining formatting semantics

My advice? To run. Find another. Find another solution. Create LaTeX files and convert them to PDF files. Generate something else. Create text documents and print them using PDFCreator . Create images. Manage Firefox to print pages as PDF files. Find to avoid the need for pdf files at all. All if this is not a fight with html, xsl-fo, FOP, xslt and xpath.

PS: Let me know if you need help. :-)

+3
source share

I will try XSLT first . When you talk about formatting XML documents (and that is pretty much what you are talking about), this tool is for this.

From the Wiki :

"The general idea of ​​using XSL-FO is that the user writes the document, not in FO, but in XML. XHTML, DocBook and TEI are all possibilities, but it can be any XML language. Then the user receives the XSLT Transformation, either by write one by yourself or by finding one for the document type in the question. This XSLT conversion converts XML to XSL-FO. "

You need the XSLT conversion for HTML to XSL-FO. Not sure where to get it, but, apparently, the concept is not a stranger.

+1
source share

Very informative exchange here. I created a web application using ASP.NET and C # .NET for my IT contract business. One of the main goals of a web application is to create custom resumes in various formats. I store the contents of a resume in a SQL Server database and build XML mainly using the C # method. I used XSLT to convert to HTML and, with little ease, finally got a basic presentable summary. My next goal is to get a printed version of the resume. I got an XML book from the library and touched XSLT a bit. Then I came to the head of XSL-FO. This is when the iceberg hit. I wanted to take on the task of having a PDF option that would be a menu choice, and convert XSLT to XSL-FO into PDF. The thing is - all the recommendations of the book contain links to commercial products. It just doesn’t cost money since PDF is not required. I looked at Altova XMLSpy on a 30-day basis, but as soon as I tried my first conversion of an XSL-FO example file, I got a message saying that I need to download more software. This download was forever from their site, so I gave up and uninstalled the software. Free commercial software from other vendors does not have a conversion option. After reading the notes here, I decided to avoid XSL-FO myself. I will try to get the version of MS Word now, and if my clients want to convert it to PDF, they can pay for the PDF version for creating Adobe.

0
source share

This is a dead question, but I would like to add for future readers that the current implementation on FCKEditor (CKEditor now) is better at creating high-quality XHTML (possibly even a user-defined set of tags).

I ran into similar problems, not actually using XSL-FO, but using (X) an HTML to PDF converter that renders PDF from your source without XSL Transforms. I check the created XHTML and fix rare problems with the HtmlAgilityPack - this way you will get a long way from non-semantic HTML complexities. There are many converters to choose from, my choice is wkhtmltopdf (If money is not a problem, then PrinceXML is an excellent alternative - I would love to use it, but it's too expensive).

0
source share

All Articles