PEG and spaces / comments

I have experience writing parsers with ANTLR, and I'm trying (for self-education :)) to transfer one of them to PEG (grammar of the Parsing expression).

As I try to understand this idea, it seems cumbersome to me how much I feel I have missed something: How to deal with gaps.

In ANTLR, the normal way to deal with spaces and comments was to place markers in a hidden channel, but there is no tokenization step using PEG grammars. Given languages โ€‹โ€‹such as C or Java, where comments are allowed almost universally, one could immediately โ€œhideโ€ comments, but since comments can have semantic meaning (for example, when creating code documentation, class diagrams, etc.), One doesnโ€™t just wanted to drop them.

So is there a way to handle this?

+8
parser-generator peg
source share
2 answers

Since there is no separate tokenization phase, there is no โ€œtimeโ€ for discarding certain characters (or tokens).

Since you are familiar with ANTLRs, think of it this way: let ANTLR process only PEG. That way you only have parser rules, no lexer rules. Now, how would you choose, say, spaces? (You can not).

So, the answer to your question: you cannot, you have to interfere with your grammar with the space rules in PEG:

ANTLR

add_expr : Num Add Num ; Add : '+'; Num : '0'..'9'+; Space : ' '+ {skip();}; 

Peg

 add_expr : num _ '+' _ num ; num : '0'..'9'+; _ : ' '*; 
+8
source share

You can insert PEG parsers. The idea is that the first parsers consume symbols and feed tokens to the second parser. The second PEG parser consumes tokens and does the real job.

Of course, this means that you are giving up one of the advantages of the grammar of the Parsing Expression expression over other syntax schemes: the simplicity of PEG.

+2
source share

All Articles