LLVM JIT Parser writes with Bison / Antlr / Packrat / Elkhound /

LLVM tutorials have instructions for writing a simple JIT compiler. Unfortunately, the lexer and parser in this manual are hand-written. I thought this solution was useful for training purposes, but it is not suitable for writing complex, production-ready compilers. It seems that GCC and several other "big compilers" are written by hand. But I think that all these parser generators give a big impetus to writing your own compiler (especially when you do it yourself, without a team of people).

Is it possible to use any existing syntax generator like Bison / Antlr / Packrat / Elkhound etc. along with LLVM to create a JIT compiler? I want to be able to "feed" the parser constantly (more than once in the beginning) using expressions and compile them at runtime.

Additionally, I have found many questions about the โ€œbest, modernโ€ parser generator (for example, this https://stackoverflow.com/questions/428892/what-parser-generator-do-you-recommend ). If you can use these tools to create the LLVM JIT compiler, I would be grateful for any additional tips and advice which tool would be better in terms of performance and flexibility in this particular case.

+7
source share
1 answer

There are many advantages to using a parser generator such as bison or antlr, especially when developing a language. You will undoubtedly end up making changes to the grammar when you go, and you will want to get documentation on the final grammar. Tools that produce grammar automatically from the documentation are really useful. They can also help you ensure that the grammar of the language (a), in your opinion, and (b) is not ambiguous.

If your language (unlike C ++) is actually LALR (1) or even better, LL (1), and you use LLVM tools to create AST and IR, then it is unlikely that you will need to do much more. than write a grammar and provide some simple steps to create an AST. This will keep you awhile.

The usual reason people ultimately prefer to create their own parsers, with the exception of biased "real programmers not using parser generators," is that it is not easy to provide good diagnosis of syntax errors, especially with LR (1) parsing. If this is one of your goals, you should try to do your grammar LL (k) parsing (it is still not easy to provide good diagnostics with LL (k), but it's a little easier) and use LL (k) as Antlr.

There is another strategy, which is to first analyze the program text in the simplest way, using the LALR (1) parser, which is more flexible than LL (1), without even trying to provide diagnostics. If the parsing fails, you can parse it again using a slower, maybe even a reverse parser that does not know how to create an AST, but keeps track of the source location and tries to repair the syntax errors. Recovering from syntactic ers without canceling AST is even more difficult than just continuing to parse, so you can say a lot so you donโ€™t try. In addition, source location tracking is very slow, and it is not very useful if you do not need to perform diagnostics (if you do not need it to add debug annotations), so you can speed up the analysis a bit without worrying about location tracking.

Personally, I tend to parse packages because he does not understand what constitutes the real language that PEG analyzes. Other people don't mind that much, and YMMV.

+9
source

All Articles