PHP-based LaTeX parser - where to start?

Project: I want to build a LaTeX-to-MathML translator in PHP. What for? Because I'm a mathematician, and I want to post math on my Drupal site. It does not have to translate all LaTeX, since the basic material at the document level is skillfully processed by CMS and will not be written in LaTeX for starters; he just needs to translate the math written in LaTeX into the math written in MathML. Although it seems to me that I have shown due diligence, it seems that this no longer exists. Maybe I'm wrong - if you know something that will serve this purpose, be sure to let me know in advance. But, believing that this does not exist, I think I should write it myself.

Here's what, however: I have never done anything so ambitious. I don’t know where to start. I have used PHP for many years, but just to make the standard "build CMS with PHP and MySQL." I have never attempted to do something like complex, like translating from one language to another.

I'm just stupid enough to treat it with regular expressions. After all, LaTeX is a much more formal language, and it does not allow almost kinds of pathological marginal cases, as they say, HTML. But, on the other hand, I'm smart enough to understand that this is probably a terrible idea: now I have two problems, and I'm sure I do not want to end up like this guy .

So, if this is not the way (right?), What is it? How do I start thinking about this problem? I am essentially writing the LaTeX compiler in PHP, and if so, what do I need to know to do this (for example, should I just read the Purple Dragon book first?)?

I am very excited and rather intimidated by the prospect of this project, but, hey, we all study programmers, right? If we don’t need something, we go and build it, necessity is the mother ... you understand. Many thanks to everyone for all that you can offer.

+6
php parsing latex
source share
6 answers

Do not write the parser yourself if you do not want to do this as a learning experience. Just call the existing LaTeX tools from PHP.

LaTeX2HTML is about as good as you are, and here is the (old) description of the LaTeX to MathML converter from the accompanying LaTeX2HTML.

+2
source share

Last year, I really had a way out. Something works for me, although I would not say that he had any elegance or charm, and it was not fully functional.

If you want to convert equations to MathML rather than the full LaTeX transform, then you can use itex2MML. If you can load extensions into your PHP, you can compile itex2MML with PHP bindings and use it natively in scripts. Makefiles may require a bit of hacking to get all the configurations.

References:

+2
source share

Good thing this answer was a mess.

Here's the cleaned version:

Since regex does not explicitly shorten it for the translator for this type of thing, you have two options based on your goals:

  • You just want to show LATEX on your site anyway.
    • If this is what you need, there is a simple solution for you that is simpler than collecting an advanced book on compiler theory. Any way to incorporate latex into your site, an existing translator, or something else.

  • You are more sensitive and want to learn about compiler theory.
    • If so, I cannot recommend PDB highly enough. This is a fascinating book, and you will learn a lot from it; After the first two chapters, you will learn enough about lexical analysis to complete this project. The best money I spent on an educational resource today!
0
source share

If you agree with the transformation of formulas into photographs, there are many solutions. If you want MathML specifically, a few are like jsMath , which uses javascript to render LaTeX in a browser. It is used by Sage and works well there.

0
source share

Wikipedia uses the LaTeX to HTML translator (or image) written in OCaml. You can take some kind of code there or just use it as is.

0
source share

All Articles