ANTLR - left recursion removal help

I have this grammar with left recursion, and I don't understand how I can make it non-left recursive. This is my first time we work with parsers / grammars etc., so please just explain everything.

msg: IDENTIFIER | IDENTIFIER LBRACKET msg RBRACKET | msg COMMA message | LBRACE msg RBRACE LBRACE atom RBRACE | msg XOR msg | msg PERCENT IDENTIFIER | IDENTIFIER PERCENT msg | LBRACKET msg RBRACKET ; atom: IDENTIFIER | fn_app ; fn_app: IDENTIFIER LBRACKET IDENTIFIER (COMMA IDENTIFIER)* RBRACKET; 

I tried on my own, but ANTLR is still talking about recursion, and I don't understand why.

ANTLR says the following:

 [fatal] rule msg_contents has non-LL(*) decision due to recursive rule invocations reachable from alts 1,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option. 

My attempt:

 msg_contents: msg_part | msg_part XOR msg_part | msg_part PERCENT msg_part ; msg_part : IDENTIFIER | IDENTIFIER LBRACKET msg_part RBRACKET | LBRACE msg_part RBRACE LBRACE atom RBRACE | IDENTIFIER PERCENT msg_part | LBRACKET msg_part RBRACKET ; 

Please, help. Thanks!

Ps If possible, provide an explanation or steps to remove recursion from such a grammar.

+4
source share
1 answer

In a nutshell, when you delete the immediate left recursion (when you look at it), you drop the recursive link and replace

  A ::= A x | y 

by

  A ::= yx* 

In your case, this means factoring on

 msg: msg ( COMMA message | XOR msg | PERCENT IDENTIFIER ) | ( IDENTIFIER | IDENTIFIER LBRACKET msg RBRACKET | LBRACE msg RBRACE LBRACE atom RBRACE | IDENTIFIER PERCENT msg | LBRACKET msg RBRACKET ) ; 

and replacing with

 msg: ( IDENTIFIER | IDENTIFIER LBRACKET msg RBRACKET | LBRACE msg RBRACE LBRACE atom RBRACE | IDENTIFIER PERCENT msg | LBRACKET msg RBRACKET ) ( COMMA message | XOR msg | PERCENT IDENTIFIER )* ; 

Wikipedia in left recursion explains this pretty well.

The ANTLR message you received is not related to left recursion. It says that ANTLR cannot decide between alternatives

 msg_contents: msg_part | msg_part XOR msg_part | msg_part PERCENT msg_part ; 

because everything starts with msg_part , which is recursive and therefore not regular, as is required for LL (*). However, this can be solved by left factoring. Also note that your attempt skipped the COMMA option.

+5
source

All Articles