ANTLR Is it possible to make a grammar with a built-in grammar inside?

ANTLR: Is it possible to make a grammar with a built-in grammar (with its own lexer) inside?

For example, in my language, I have the option to use the embedded SQL language:

var Query = [select * from table]; with Query do something ....; 

Is this possible with ANTLR?

+4
source share
2 answers

Is it possible to make a grammar with a built-in grammar (with its own lexer) inside?

If you mean, is it possible to define two languages ​​in one grammar (using separate lexers), then the answer will be: no, that not possible.

However, if the question is whether it is possible to parse two languages ​​into one AST, then the answer will be: yes, it is possible.

You just need to:

  • define both languages ​​in their grammar;
  • create a lexer rule in your main grammar that captures the entire input of the embedded language;
  • use a rewrite rule that calls a user-defined method that analyzes the external AST and inserts it into the main AST using { ... } (see the expr rule in the main grammar ( MyLanguage.g )).

MyLanguage.g

 grammar MyLanguage; options { output=AST; ASTLabelType=CommonTree; } tokens { ROOT; } @members { private CommonTree parseSQL(String sqlSrc) { try { MiniSQLLexer lexer = new MiniSQLLexer(new ANTLRStringStream(sqlSrc)); MiniSQLParser parser = new MiniSQLParser(new CommonTokenStream(lexer)); return (CommonTree)parser.parse().getTree(); } catch(Exception e) { return new CommonTree(new CommonToken(-1, e.getMessage())); } } } parse : assignment+ EOF -> ^(ROOT assignment+) ; assignment : Var Id '=' expr ';' -> ^('=' Id expr) ; expr : Num | SQL -> {parseSQL($SQL.text)} ; Var : 'var'; Id : ('a'..'z' | 'A'..'Z')+; Num : '0'..'9'+; SQL : '[' ~']'* ']'; Space : ' ' {skip();}; 

MiniSQL.g

 grammar MiniSQL; options { output=AST; ASTLabelType=CommonTree; } parse : '[' statement ']' EOF -> statement ; statement : select ; select : Select '*' From ID -> ^(Select '*' From ID) ; Select : 'select'; From : 'from'; ID : ('a'..'z' | 'A'..'Z')+; Space : ' ' {skip();}; 

Main.java

 import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { String src = "var Query = [select * from table]; var x = 42;"; MyLanguageLexer lexer = new MyLanguageLexer(new ANTLRStringStream(src)); MyLanguageParser parser = new MyLanguageParser(new CommonTokenStream(lexer)); CommonTree tree = (CommonTree)parser.parse().getTree(); DOTTreeGenerator gen = new DOTTreeGenerator(); StringTemplate st = gen.toDOT(tree); System.out.println(st); } } 

Run the demo

 java -cp antlr-3.3.jar org.antlr.Tool MiniSQL.g java -cp antlr-3.3.jar org.antlr.Tool MyLanguage.g javac -cp antlr-3.3.jar *.java java -cp .:antlr-3.3.jar Main 

Given input:

 var Query = [select * from table]; var x = 42; 

Main class output corresponds to the following AST:

enter image description here

And if you want to allow string literals inside your SQL (which may contain ] ) and comments (which may contain ' and ] ), you can use the following SQL rule inside your main grammar:

 SQL : '[' ( ~(']' | '\'' | '-') | '-' ~'-' | COMMENT | STR )* ']' ; fragment STR : '\'' (~('\'' | '\r' | '\n') | '\'\'')+ '\'' | '\'\'' ; fragment COMMENT : '--' ~('\r' | '\n')* ; 

which will correctly analyze the following input in one token:

 [ select a,b,c from table where a='A''B]C' and b='' -- some ] comment ] here' ] 

Just be careful that trying to create a grammar for an entire SQL dialect (or even a large subset) is not a trivial task! You might want to find existing SQL parsers or browse the ANTLR wiki for sample grammars.

+9
source

Yes, with AntLR this is called language grammar. You can get a working example in v3 examples inside the island grammar folder: it shows the use of grammar to parse javadoc comments inside java code.

You can also find some clues in the Island Grammars Under Parser Control document and that Another .

+2
source

All Articles