ANTLR: call a rule from another grammar

Question

ANTLR: call a rule from another grammar

Is it possible to call a rule from another grammar?
the goal is to have two languages in one file, a second language starting with (beginning ...), where ... is in the second language. The grammar must call another grammar to analyze this second language.

eg:

grammar A; start_rule : '(' 'begin' B.program ')' //or something like that ;

 grammar B; program : something* EOF ; something : ... ;

+7

antlr modularity grammar rule

Sebastian Jul 11 '11 at 14:01

source share

1 answer

Bart kiers · Accepted Answer · 2011-07-11T15:46:37+0000

Your question can be interpreted (at least) in two ways:

separate rules from large grammar to separate grammar;
analyze a separate language inside your "main" language (island grammar).

I guess this is the first one, in which case you can import the grammar.

Demo for option 1:

file: Lg

 lexer grammar L; Digit : '0'..'9' ;

file: Sub.g

 parser grammar Sub; number : Digit+ ;

file: Root.g

 grammar Root; import Sub; parse : number EOF {System.out.println("Parsed: " + $number.text);} ;

file: Main.java

 import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { L lexer = new L(new ANTLRStringStream("42")); CommonTokenStream tokens = new CommonTokenStream(lexer); RootParser parser = new RootParser(tokens); parser.parse(); } }

Run the demo:

 bart@hades :~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool Lg bart@hades :~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool Root.g bart@hades :~/Programming/ANTLR/Demos/Composite$ javac -cp antlr-3.3.jar *.java bart@hades :~/Programming/ANTLR/Demos/Composite$ java -cp .:antlr-3.3.jar Main

which will print:

 Parsed: 42

to the console.

Additional information: http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars

Demo for option 2:

A good example of a language inside a language is regular expression. You have a “normal” regular expression language with its metacharacters, but there is one more in it: a language that describes a character set (or character class).

Instead of considering the metacharacters of the character set (range - , negation ^ , etc.) inside your regular grammar, you can simply consider the character set as a single token, consisting of [ and then everything, up to ] (maybe \] in it!) inside your regular grammar. When you then come across a CharSet token in one of the rules of your parser, you call the CharSet parser.

file: Regex.g

 grammar Regex; options { output=AST; } tokens { REGEX; ATOM; CHARSET; INT; GROUP; CONTENTS; } @members { public static CommonTree ast(String source) throws RecognitionException { RegexLexer lexer = new RegexLexer(new ANTLRStringStream(source)); RegexParser parser = new RegexParser(new CommonTokenStream(lexer)); return (CommonTree)parser.parse().getTree(); } } parse : atom+ EOF -> ^(REGEX atom+) ; atom : group quantifier? -> ^(ATOM group quantifier?) | EscapeSeq quantifier? -> ^(ATOM EscapeSeq quantifier?) | Other quantifier? -> ^(ATOM Other quantifier?) | CharSet quantifier? -> ^(CHARSET {CharSetParser.ast($CharSet.text)} quantifier?) ; group : '(' atom+ ')' -> ^(GROUP atom+) ; quantifier : '+' | '*' ; CharSet : '[' (('\\' .) | ~('\\' | ']'))+ ']' ; EscapeSeq : '\\' . ; Other : ~('\\' | '(' | ')' | '[' | ']' | '+' | '*') ;

file: CharSet.g

 grammar CharSet; options { output=AST; } tokens { NORMAL_CHAR_SET; NEGATED_CHAR_SET; RANGE; } @members { public static CommonTree ast(String source) throws RecognitionException { CharSetLexer lexer = new CharSetLexer(new ANTLRStringStream(source)); CharSetParser parser = new CharSetParser(new CommonTokenStream(lexer)); return (CommonTree)parser.parse().getTree(); } } parse : OSqBr ( normal -> ^(NORMAL_CHAR_SET normal) | negated -> ^(NEGATED_CHAR_SET negated) ) CSqBr ; normal : (EscapeSeq | Hyphen | Other) atom* Hyphen? ; negated : Caret normal -> normal ; atom : EscapeSeq | Caret | Other | range ; range : from=Other Hyphen to=Other -> ^(RANGE $from $to) ; OSqBr : '[' ; CSqBr : ']' ; EscapeSeq : '\\' . ; Caret : '^' ; Hyphen : '-' ; Other : ~('-' | '\\' | '[' | ']') ;

file: Main.java

 import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo"); DOTTreeGenerator gen = new DOTTreeGenerator(); StringTemplate st = gen.toDOT(tree); System.out.println(st); } }

And if you run the main class, you will see the DOT output for the regular expression ((xyz)*[^\\da-f])foo , which is the following tree:

The magic is inside the Regex.g grammar in the atom rule, where I inserted the node tree into the rewrite rule by calling the static ast method from the CharSetParser class:

 CharSet ... -> ^(... {CharSetParser.ast($CharSet.text)} ...)

Please note that there should be no half-tones inside such rewriting rules! So, that would be wrong: {CharSetParser.ast($CharSet.text);} .

EDIT

And here's how to create tree-like wallpapers for both grammars:

file: RegexWalker.g

 tree grammar RegexWalker; options { tokenVocab=Regex; ASTLabelType=CommonTree; } walk : ^(REGEX atom+) {System.out.println("REGEX: " + $start.toStringTree());} ; atom : ^(ATOM group quantifier?) | ^(ATOM EscapeSeq quantifier?) | ^(ATOM Other quantifier?) | ^(CHARSET t=. quantifier?) {CharSetWalker.walk($t);} ; group : ^(GROUP atom+) ; quantifier : '+' | '*' ;

file: CharSetWalker.g

 tree grammar CharSetWalker; options { tokenVocab=CharSet; ASTLabelType=CommonTree; } @members { public static void walk(CommonTree tree) { try { CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree); CharSetWalker walker = new CharSetWalker(nodes); walker.walk(); } catch(Exception e) { e.printStackTrace(); } } } walk : ^(NORMAL_CHAR_SET normal) {System.out.println("NORMAL_CHAR_SET: " + $start.toStringTree());} | ^(NEGATED_CHAR_SET normal) {System.out.println("NEGATED_CHAR_SET: " + $start.toStringTree());} ; normal : (EscapeSeq | Hyphen | Other) atom* Hyphen? ; atom : EscapeSeq | Caret | Other | range ; range : ^(RANGE Other Other) ;

Main.java

 import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo"); CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree); RegexWalker walker = new RegexWalker(nodes); walker.walk(); } }

To start the demo, do:

 java -cp antlr-3.3.jar org.antlr.Tool CharSet.g java -cp antlr-3.3.jar org.antlr.Tool Regex.g java -cp antlr-3.3.jar org.antlr.Tool CharSetWalker.g java -cp antlr-3.3.jar org.antlr.Tool RegexWalker.g javac -cp antlr-3.3.jar *.java java -cp .:antlr-3.3.jar Main

which will print:

 NEGATED_CHAR_SET: (NEGATED_CHAR_SET \d (RANGE af)) REGEX: (REGEX (ATOM (GROUP (ATOM (GROUP (ATOM x) (ATOM y) (ATOM z)) *) (CHARSET (NEGATED_CHAR_SET \d (RANGE af))))) (ATOM f) (ATOM o) (ATOM o))

ANTLR: call a rule from another grammar

Demo for option 1:

file: Lg

file: Sub.g

file: Root.g

file: Main.java

Run the demo:

Demo for option 2:

file: Regex.g

file: CharSet.g

file: Main.java

EDIT

file: RegexWalker.g

file: CharSetWalker.g

Main.java

More articles: