Best parser generator to parse many small C ++ texts?

For performance reasons, I am migrating the C # library to C ++. During normal operation, this library, among other things, needs to analyze approximately 150,000 mathematical expressions (think excel formulas) with an average length of less than 150 characters.

In the C # version, I used the GOLD parser to generate the parsing code. It can analyze all 150,000 expressions in one second.

Since we were thinking about expanding our language, I decided that switching to C ++ might be a good chance to switch to ANTLR. I ported the (simple) grammar to ANTLR and generated C code from it. Parsing 150,000 expressions takes more than 12 seconds, because for each expression I need to create a new ANTL3_INPUT_STREAM, token, lexer and parser - at least in version 3.4 there is no way to reuse them.

I would appreciate it if someone could give me a recommendation on what to use instead - GOLD, of course, an option, although creating code in C ++ or C seems a lot more complicated than C #. My grammar is compatible with LALR and LL (1). The priority is to analyze performance on small inputs.

+1
c ++ performance antlr parser-generator gold-parser
source share
6 answers

I would try boost :: spirit. This is often very fast (even for parsing simple things like an integer, it can be faster than the C atoi function http://alexott.blogspot.com/2010/01/boostspirit2-vs-atoi.html )

http://boost-spirit.com/home/

It has good things: just a headline, so hellish hell, liberal license.

However, it should be warned that the learning curve is difficult. This is modern C ++ (without a pointer, but a lot of patterns and very unpleasant compilation errors), therefore, based on C or C #, it may not be very convenient for you.

+9
source share

If the grammar to be parsed is simple, you can simply write the parser manually.

Most parser generators are designed to simplify hacking a working analyzer, and as a result, runtime often occurs.

+4
source share

The best performance I've seen while parsing came from Boost.Spirit.Qi, which expresses grammar in C ++ using meta-template programming. This is not for the faint of heart, though.

This will need to be well isolated, and the compilation time of the file containing the parser will increase to a few seconds (so make sure that there will be as little as possible).

+3
source share

If the syntax of your expression is simple enough, consider creating a hand-written recursive descent parser . It can work very quickly and gives you the opportunity (with sufficient care) to report good and specific syntax errors.

You can also use bison , but I believe that a hand-written recursive parser is more likely to be faster.

And you can do the lexing with the flex generated lexer and do the parsing manually using the recursive descent.

For your information, the GCC compiler has its own recursive descent parsers for C ++ and C, at least. It no longer uses parser generators (e.g. bison or ANTLR).

+2
source share

Instead of expr you recognize the sequence-of-expr grammar.

EDIT:

Instead of the bison syntax:

 start: expr { process_expr ($1); } ; 

It has:

 start: expr_seq ; expr_seq: expr { process_expr ($1); } | expr_seq expr { process_expr ($2); } ; 
+1
source share

I wrote a lot of parsers, and manual recursive descent is how I do it. They are easy to write and largely optimal.

However, if speed is what you need, no matter what you write, there will be enough room to speed it up. It will be so that you can surprise you, because all that you might think you have already done.

Here is a slide set showing how to do this.

+1
source share

All Articles