C ++ what is the advantage of lex and bison for a makeshift tokenizer / parser

I would like to do parsing and tokenization in C ++ for training purposes. Now I often met with bison / yacc and lex when I read about it on the Internet. Will any mayor benefit from using those for which, for example, a tokenizer / parser written using STL or boost :: regex, or perhaps even C?

+7
c ++ tokenize parsing bison
source share
4 answers

I recently started writing a simple lexer and parser.

It turned out that the lexer is easier to prescribe manually. But the parser was a bit more complicated. My parser created by Bison worked almost immediately from a quarry, and it gave me many useful messages about where I forgot about the state. I later wrote the same parser manually, but it took me a lot more debugging before I worked fine.

The appeal of creating tools for lexers and parsers is that you can write the specification in a clean, easy-to-read language that is close to the shortest possible execution of your specification. A handwritten parser is usually at least twice as large. In addition, an automated parser (/ lexer) comes with a lot of diagnostic code and logic to help you debug this thing.

The parser / lexer specifier in a language like BNF is also much easier to change if your language or requirements change. If you are dealing with a handwritten parser / lexer, you may need to look deep into your code and make significant changes.

Finally, since they are often implemented as end state machines without backtracking (gazillions of options on Bison, so this is not always specified), it is possible that your automatically generated code will be more efficient, encoded product.

+13
source share

Has someone else written and FORGOTTEN them for you?

+4
source share

Its simpler, and they are more general. Bison / Lex can generate and analyze arbitrary grammar and present it in what could be easier. They can also be faster, depending on how well you write your regular expression.

I would not want to write my own parser in C, since the language does not have much intuition about strings. If you write your own, I would recommend perl for the convenience of regex (or perhaps python).

It is probably faster to use existing tools, but it may or may not be so much fun. If you have time, and since it's just for training, go for it. C ++ is a good language to start with.

+1
source share

Different touches for different people. I personally like recursive parsers - I find them easy to understand, and you can get them to generate end-user error messages for those created by tools like bison.

0
source share

All Articles