Pars comment line

Given the following basic grammar, I want to understand how I can handle comment lines. There is no processing <CR><LF> , which usually completes the comment line - the only exception is the last line of comments before EOF, e. g :.

 # comment abcd := 12 ; # comment eof without <CR><LF> 


 grammar CommentLine1a; //========================================================== // Options //========================================================== //========================================================== // Lexer Rules //========================================================== Int : Digit+ ; fragment Digit : '0'..'9' ; ID_NoDigitStart : ( 'a'..'z' | 'A'..'Z' ) ('a'..'z' | 'A'..'Z' | Digit )* ; Whitespace : ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN ; } ; //========================================================== // Parser Rules //========================================================== code : ( assignment | comment )+ ; assignment : id_NoDigitStart ':=' id_DigitStart ';' ; id_NoDigitStart : ID_NoDigitStart ; id_DigitStart : ( ID_NoDigitStart | Int )+ ; comment : '#' ~( '\r' | '\n' )* ; 
+7
source share
1 answer

If you have a very good reason to put a comment in the parser (which I would like to hear), you should put it in the lexer:

 Comment : '#' ~( '\r' | '\n' )* ; 

And since you already consider line breaks in your Space rule, there is no problem with input like # comment eof without <CR><LF>

In addition, if parser rules use literal tokens, ANTLR automatically creates lexer rules from behind the scenes. So in your case:

 Comment : '#' ~( '\r' | '\n' )* ; 

will match '#' followed by zero or more tokens other than '\r' and '\n' and not zero or more characters other than '\r' and '\n' .

For future reference:

Inside the parser rules

  • ~ cancel tokens
  • . matches any token

Internal lexer rules

  • ~ cancels characters
  • . matches any character in the range 0x0000 ... 0xFFFF
+15
source

All Articles