Superficially, an easy question: how can I get a great PDF from my XML document? In fact, my input is a subset of XHTML with the addition of several custom attributes (to save some information about the sources of quotes, etc.). I studied some routes and would like to get some feedback if someone has tried any of this before.
Note. I looked at XSL-FO for creating PDFs, but I heard that the typographic quality of open source tools is still far behind TeX. Guess the most advanced is Apache FOP . But I'm really interested in great PDFs (otherwise I could use my browser’s print dialog). Any thoughts, updates on this?
So, I was thinking about using XSLT to convert my custom XML / XHTML dialect to DocBook and from there ( DocBook through XSLT to the correct HTML seems to work quite well, so I can use it for that too). But how do I upgrade from DocBook to TeX? I came across a number of solutions.
- dblatex A collection of XSLT stylesheets that LaTeX displays .
- db2latex Started as a dblatex clone, but now provides tighter integration with LaTex packages and provides one script for PDF output, which is pretty nice.
- passiveTex Instead of XSLT, it uses an XML parser written in TeX.
- TeXML is essentially a LaTeX XML serialization that can be used as an intermediate format and an accompanying python tool that converts from this XML format to LaTeX / ConTeXt. They argued that this avoided the problems of existing solutions with special characters, lost some braces or spaces and only supported Latin-1 encoding. (Is that still the case?)
Since my input XML can contain quite a few special characters represented in Unicode, the last moment is especially important for me. I also thought about using XeTeX instead of pdfTeX to get around this problem. (Although I may lose some typographic quality, it may be even better than modern open-source processors XSL-FO?) Thus, db2latex and TeXML seem to be favorites. So can anyone comment on their reliability?
As an alternative, I may be able to use ConTeXt directly, as it seems that the interest in the ConTeXt community in XML is quite significant . In particular, I could take a deeper look at “My Way: Getting Web Content and PDF Output from One Source” and “Working with XML in ConTeXt MkIV” . Both documents describe an approach that uses ConTeXt in conjunction with LuaTeX. ( DocBook In ConTeXt seems to be doing roughly the same thing, but the latest version is from 2003.) A second document notes:
You may wonder why we are doing these manipulations in TEX, and not instead of xslt. The advantage of an integrated approach is that it simplifies use. Consider not only processing a document, but also using xml to manage resources in the same mode. The Xslt approach is just as verbose (after all, you still need to create TEX code) and perhaps less readable. In the case of MkIV, the integrated approach is also faster and gives us the ability to manage content at runtime using Lua.
What do you think about this? Please keep in mind that I have some experience with XSLT and TeX, but I have never been terribly deep in any of them. I have never tried many different LaTeX packages or alternatives such as ConTeXt (or XeTeX / LuaTeX instead of pdfTeX), but I am ready to learn some new things in order to eventually get my beautiful PDF files;)
Also, I came across Pandoc , but could not find any information on how it compares with the other approaches mentioned. And finally, a link to fairly extensive documentation on how to use TeXML with ConTeXt .