ANTLR - Implicit and tokens in a tree

I am trying to create a grammar that interprets user-entered text, a search engine style. It will support the logical operators AND, OR, NOT and ANDNOT. Almost everything works for me, but I want to add a rule that two adjacent words outside the quoted string are implicitly processed as in the AND clause. For instance:

cheese and crackers = cheese and crackers

(up and down) or (left and right) = (up and down) OR (left and right)

cat dog "fancy pig" = cat And dog And "pig"

I have problems with the latter, and I hope someone can point me in the right direction. Heres my * .g file so far, and please be nice, my ANTLR experience spans less than a working day:

grammar SearchEngine; options { language = CSharp2; output = AST; } @lexer::namespace { Demo.SearchEngine } @parser::namespace { Demo.SearchEngine } LPARENTHESIS : '('; RPARENTHESIS : ')'; AND : ('A'|'a')('N'|'n')('D'|'d'); OR : ('O'|'o')('R'|'r'); ANDNOT : ('A'|'a')('N'|'n')('D'|'d')('N'|'n')('O'|'o')('T'|'t'); NOT : ('N'|'n')('O'|'o')('T'|'t'); fragment CHARACTER : ('a'..'z'|'A'..'Z'|'0'..'9'); fragment QUOTE : ('"'); fragment SPACE : (' '|'\n'|'\r'|'\t'|'\u000C'); WS : (SPACE) { $channel=HIDDEN; }; PHRASE : (QUOTE)(CHARACTER)+((SPACE)+(CHARACTER)+)+(QUOTE); WORD : (CHARACTER)+; startExpression : andExpression; andExpression : andnotExpression (AND^ andnotExpression)*; andnotExpression : orExpression (ANDNOT^ orExpression)*; orExpression : notExpression (OR^ notExpression)*; notExpression : (NOT^)? atomicExpression; atomicExpression : PHRASE | WORD | LPARENTHESIS! andExpression RPARENTHESIS!; 
+4
source share
1 answer

Since your AND rule has the optional AND keyword, you must create an imaginary AND token and use the rewrite rule to β€œinsert” this token into your tree. In this case, you cannot use ANTLR short-hand ^ root-operator. You will need to use the rewrite operator -> .

Your andExpression should look like this:

 andExpression : (andnotExpression -> andnotExpression) (AND? a=andnotExpression -> ^(AndNode $andExpression $a))* ; 

A detailed description of this (possibly critical) notation is given in chapter 7, section Rewrite Rules in Substrings , pages 173-174 ANTLR Final Reference from Terence Parr.

I checked a quick test to see if the grammar creates the correct AST with the new andExpression rule. After parsing the line cat dog "potbelly and pig" and FOO , the generated parser produced the following AST:

alt text http://img580.imageshack.us/img580/7370/andtree.png

Note that AndNode and Root are imaginary tokens .

If you want to learn how to create an AST image above, see this topic: Visualizing an AST Created Using ANTLR (.NET)

EDIT

When analyzing both one two three and (one two) three , the following AST is created:

alt text http://img203.imageshack.us/img203/2558/69551879.png

And when analyzing (one two) OR three , AST is created:

alt text http://img340.imageshack.us/img340/8779/73390353.png

which, apparently, is the right way in all cases.

+6
source

All Articles