Range quantifier syntax in ANTLR Regex

Question

Range quantifier syntax in ANTLR Regex

It should be pretty simple. I am working on lexical grammar using ANTLR and want to limit the maximum length of variable identifiers to 30 characters. I tried to accomplish this with this line (after the usual regular expression - except for the syntax "thing"):

ID : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'){0,29} {System.out.println("IDENTIFIER FOUND.");} ;

There are no errors in the code generation, but compilation failed due to a line in the generated code, which was simple:

0.29

Obviously, antlr takes a section of text between the brackets and places it in the acceptance status area along with the print line. I searched the ANTLR site and I did not find an example or link to an equivalent expression. What should be the syntax of this expression?

+7

java regex antlr

user1634761 Aug 30 '12 at 1:44

source share

1 answer

walrii · Accepted Answer · 2012-08-30T03:08:41+0000

ANTLR does not support the quantifier syntax {m,n} . ANTLR sees {} your quantifier and cannot distinguish them from the {} that surround your actions.

Workarounds:

Keep the limit semantically. Let it collect an unlimited size identifier and then complain / truncate it as part of your action code or later in the compiler.
Create quantification rules manually.

This is an example of a manual rule that restricts identifiers to 8.

 SUBID : ('a'..'z'|'A'..'Z'|'0'..'9'|'_') ; ID : ('a'..'z'|'A'..'Z') (SUBID (SUBID (SUBID (SUBID (SUBID (SUBID SUBID?)?)?)?)?)?)? ;

Personally, I would go with a semantic solution (# 1). Nowadays there is very little reason to restrict identifiers in the language and even fewer reasons to cause a syntax error (early interruption of compilation) if such a rule is violated.

Range quantifier syntax in ANTLR Regex

More articles: