I have the following grammar for ANTLR 4:
grammar Pattern; //parser rules parse : string LBRACK CHAR DASH CHAR RBRACK ; string : (CHAR | DASH)+ ; //lexer rules DASH : '-' ; LBRACK : '[' ; RBRACK : ']' ; CHAR : [A-Za-z0-9] ;
And I'm trying to parse the next line
ab-cd[0-9]
The code parses ab-cd on the left, which will be considered as a literal string in my application. Then it parses [0-9] as a set of characters, which in this case will translate to any digit. My grammar works for me, except that I do not like to have (CHAR | DASH)+ as a rule, a parser when it is simply considered as a token. I would prefer lexer to create a STRING token and give me the following tokens:
"ab-cd" "[" "0" "-" "9" "]"
instead of these
"ab" "-" "cd" "[" "0" "-" "9" "]"
I looked at other examples, but could not figure it out. Typically, other examples have quotation marks around such string literals, or they have spaces to help distinguish between input. I would like to avoid both. Can this be done using lexer rules, or do I need to continue to process it in the parser rules, as I do?
Charles
source share