Suppose you want to parse simple expressions consisting of the following tokens:
- subtraction (also unary);+ addition;* multiplication;/ division;(...) groupings of (sub) expressions;- integer and decimal numbers.
ANTLR grammar might look like this:
grammar Expression; options { language=CSharp2; } parse : exp EOF ; exp : addExp ; addExp : mulExp (('+' | '-') mulExp)* ; mulExp : unaryExp (('*' | '/') unaryExp)* ; unaryExp : '-' atom | atom ; atom : Number | '(' exp ')' ; Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
Now, to create the correct AST, you add output=AST; into your options { ... } section, and you mix some "tree operators" in your grammar, determining which tokens should be the root of the tree. There are two ways to do this:
- add
^ and ! after your tokens. ^ leads to the fact that the token becomes the root, and ! excludes token from ast; - using the "rewrite rules":
... -> ^(Root Child Child ...) .
Take the foo rule, for example:
foo : TokenA TokenB TokenC TokenD ;
and say that you want TokenB become the root of both TokenA and TokenC in order to become its children, and you want to exclude TokenD from the tree. Here's how to do it using option 1:
foo : TokenA TokenB^ TokenC TokenD! ;
and here is how to do it using option 2:
foo : TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC) ;
So, here is a grammar with tree operators in it:
grammar Expression; options { language=CSharp2; output=AST; } tokens { ROOT; UNARY_MIN; } @parser::namespace { Demo.Antlr } @lexer::namespace { Demo.Antlr } parse : exp EOF -> ^(ROOT exp) ; exp : addExp ; addExp : mulExp (('+' | '-')^ mulExp)* ; mulExp : unaryExp (('*' | '/')^ unaryExp)* ; unaryExp : '-' atom -> ^(UNARY_MIN atom) | atom ; atom : Number | '(' exp ')' -> exp ; Number : ('0'..'9')+ ('.' ('0'..'9')+)? ; Space : (' ' | '\t' | '\r' | '\n'){Skip();} ;
I also added a Space rule to ignore any spaces in the source file and added some additional tokens and namespaces for lexer and parser. Note that order is important (first options { ... } , then tokens { ... } and finally, declarations @... {} -namespace).
What is it.
Now create the lexer and parser from the grammar file:
java -cp antlr-3.2.jar org.antlr.Tool Expression.g
and put the .cs files in the project along with the C # runtime DLL .
You can test it using the following class:
using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.StringTemplate; namespace Demo.Antlr { class MainClass { public static void Preorder(ITree Tree, int Depth) { if(Tree == null) { return; } for (int i = 0; i < Depth; i++) { Console.Write(" "); } Console.WriteLine(Tree); Preorder(Tree.GetChild(0), Depth + 1); Preorder(Tree.GetChild(1), Depth + 1); } public static void Main (string[] args) { ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5"); ExpressionLexer Lexer = new ExpressionLexer(Input); CommonTokenStream Tokens = new CommonTokenStream(Lexer); ExpressionParser Parser = new ExpressionParser(Tokens); ExpressionParser.parse_return ParseReturn = Parser.parse(); CommonTree Tree = (CommonTree)ParseReturn.Tree; Preorder(Tree, 0); } } }
which produces the following output:
ROOT
*
+
12.5
/
56
UNARY_MIN
7
0.5
which corresponds to the following AST:

(chart created using graph.gafol.net )
Please note that ANTLR 3.3 has just been released, and CSharp's goal is "in beta." This is why I used ANTLR 3.2 in my example.
In the case of fairly simple languages (for example, my example above), you can also evaluate the result "on the fly" without creating an AST. You can do this by inserting regular C # code into your grammar file and letting the parser rules return a specific value.
Here is an example:
grammar Expression; options { language=CSharp2; } @parser::namespace { Demo.Antlr } @lexer::namespace { Demo.Antlr } parse returns [double value] : exp EOF {$value = $exp.value;} ; exp returns [double value] : addExp {$value = $addExp.value;} ; addExp returns [double value] : a=mulExp {$value = $a.value;} ( '+' b=mulExp {$value += $b.value;} | '-' b=mulExp {$value -= $b.value;} )* ; mulExp returns [double value] : a=unaryExp {$value = $a.value;} ( '*' b=unaryExp {$value *= $b.value;} | '/' b=unaryExp {$value /= $b.value;} )* ; unaryExp returns [double value] : '-' atom {$value = -1.0 * $atom.value;} | atom {$value = $atom.value;} ; atom returns [double value] : Number {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);} | '(' exp ')' {$value = $exp.value;} ; Number : ('0'..'9')+ ('.' ('0'..'9')+)? ; Space : (' ' | '\t' | '\r' | '\n'){Skip();} ;
which can be tested using the class:
using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.StringTemplate; namespace Demo.Antlr { class MainClass { public static void Main (string[] args) { string expression = "(12.5 + 56 / -7) * 0.5"; ANTLRStringStream Input = new ANTLRStringStream(expression); ExpressionLexer Lexer = new ExpressionLexer(Input); CommonTokenStream Tokens = new CommonTokenStream(Lexer); ExpressionParser Parser = new ExpressionParser(Tokens); Console.WriteLine(expression + " = " + Parser.parse()); } } }
and outputs the following result:
(12.5 + 56 / -7) * 0.5 = 2.25
EDIT
In the comments, Ralph wrote:
Tip for those using Visual Studio: you can put something like java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g" in pre-build events, then you can just change your grammar and run the project without worrying about restoring lexer / parser.