Inside lexer rules, you can invoke rules recursively. So, this is one way to solve this problem. Another approach would be to keep track of the number of open and closed parentheses and allow a closed semantic predicate loop if your counter is greater than zero.
Demonstration:
Tg
grammar T; parse : BeginToken {System.out.println("parsed :: " + $BeginToken.text);} EOF ; BeginToken @init{int open = 1;} : '(' 'begin' ( {open > 0}?=>
Main.java
import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { String input = "(begin (define x (+ (- 1 3) 2)))"; TLexer lexer = new TLexer(new ANTLRStringStream(input)); TParser parser = new TParser(new CommonTokenStream(lexer)); parser.parse(); } }
java -cp antlr-3.3-complete.jar org.antlr.Tool Tg javac -cp antlr-3.3-complete.jar *.java java -cp .:antlr-3.3-complete.jar Main parsed :: (begin (define x (+ (- 1 3) 2)))
Note that you need to beware of string literals inside your source, which may include parentheses:
BeginToken @init{int open = 1;} : '(' 'begin' ( {open > 0}?=>
or comments that may contain parentheses.
The predicate clause uses some language-specific code (in this case, Java). The advantage of recursively invoking the lexer rule is that you do not have custom code in your lexer:
BeginToken : '(' Spaces? 'begin' Spaces? NestedParens Spaces? ')' ; fragment NestedParens : '(' ( ~('(' | ')') | NestedParens )* ')' ; fragment Spaces : (' ' | '\t')+ ;
source share