Is it possible to make a grammar with a built-in grammar (with its own lexer) inside?
If you mean, is it possible to define two languages ββin one grammar (using separate lexers), then the answer will be: no, that not possible.
However, if the question is whether it is possible to parse two languages ββinto one AST, then the answer will be: yes, it is possible.
You just need to:
- define both languages ββin their grammar;
- create a lexer rule in your main grammar that captures the entire input of the embedded language;
- use a rewrite rule that calls a user-defined method that analyzes the external AST and inserts it into the main AST using
{ ... } (see the expr rule in the main grammar ( MyLanguage.g )).
MyLanguage.g
grammar MyLanguage; options { output=AST; ASTLabelType=CommonTree; } tokens { ROOT; } @members { private CommonTree parseSQL(String sqlSrc) { try { MiniSQLLexer lexer = new MiniSQLLexer(new ANTLRStringStream(sqlSrc)); MiniSQLParser parser = new MiniSQLParser(new CommonTokenStream(lexer)); return (CommonTree)parser.parse().getTree(); } catch(Exception e) { return new CommonTree(new CommonToken(-1, e.getMessage())); } } } parse : assignment+ EOF -> ^(ROOT assignment+) ; assignment : Var Id '=' expr ';' -> ^('=' Id expr) ; expr : Num | SQL -> {parseSQL($SQL.text)} ; Var : 'var'; Id : ('a'..'z' | 'A'..'Z')+; Num : '0'..'9'+; SQL : '[' ~']'* ']'; Space : ' ' {skip();};
MiniSQL.g
grammar MiniSQL; options { output=AST; ASTLabelType=CommonTree; } parse : '[' statement ']' EOF -> statement ; statement : select ; select : Select '*' From ID -> ^(Select '*' From ID) ; Select : 'select'; From : 'from'; ID : ('a'..'z' | 'A'..'Z')+; Space : ' ' {skip();};
Main.java
import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { String src = "var Query = [select * from table]; var x = 42;"; MyLanguageLexer lexer = new MyLanguageLexer(new ANTLRStringStream(src)); MyLanguageParser parser = new MyLanguageParser(new CommonTokenStream(lexer)); CommonTree tree = (CommonTree)parser.parse().getTree(); DOTTreeGenerator gen = new DOTTreeGenerator(); StringTemplate st = gen.toDOT(tree); System.out.println(st); } }
Run the demo
java -cp antlr-3.3.jar org.antlr.Tool MiniSQL.g java -cp antlr-3.3.jar org.antlr.Tool MyLanguage.g javac -cp antlr-3.3.jar *.java java -cp .:antlr-3.3.jar Main
Given input:
var Query = [select * from table]; var x = 42;
Main class output corresponds to the following AST:

And if you want to allow string literals inside your SQL (which may contain ] ) and comments (which may contain ' and ] ), you can use the following SQL rule inside your main grammar:
SQL : '[' ( ~(']' | '\'' | '-') | '-' ~'-' | COMMENT | STR )* ']' ; fragment STR : '\'' (~('\'' | '\r' | '\n') | '\'\'')+ '\'' | '\'\'' ; fragment COMMENT : '--' ~('\r' | '\n')* ;
which will correctly analyze the following input in one token:
[ select a,b,c from table where a='A''B]C' and b=''
Just be careful that trying to create a grammar for an entire SQL dialect (or even a large subset) is not a trivial task! You might want to find existing SQL parsers or browse the ANTLR wiki for sample grammars.