Interpreting Parsing in ANTLR

I am working on simple DSL string processing for internal purposes, and I would like the language to support string interpolation, as it is used in Ruby.

For instance:

name = "Bob" msg = "Hello ${name}!" print(msg) # prints "Hello Bob!" 

I'm trying to implement my parser in ANTLRv3, but I'm pretty inexperienced with using ANTLR, so I'm not sure how to implement this function. So far, I have indicated my string literals in the lexer, but in this case, obviously, I will have to process the contents of the interpolation in the parser.

My current string literal grammar looks like this:

 STRINGLITERAL : '"' ( StringEscapeSeq | ~( '\\' | '"' | '\r' | '\n' ) )* '"' ; fragment StringEscapeSeq : '\\' ( 't' | 'n' | 'r' | '"' | '\\' | '$' | ('0'..'9')) ; 

Moving the processing of the string literal to the parser seems to make everything else work as it should. A quick search on the Internet did not provide any information. Any suggestions on how to get started with this?

+4
source share
1 answer

I am not an ANTLR expert, but here is a possible grammar:

 grammar Str; parse : ((Space)* statement (Space)* ';')+ (Space)* EOF ; statement : print | assignment ; print : 'print' '(' (Identifier | stringLiteral) ')' ; assignment : Identifier (Space)* '=' (Space)* stringLiteral ; stringLiteral : '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"' ; Interpolation : '${' Identifier '}' ; Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ; EscapeSequence : '\\' SpecialChar ; SpecialChar : '"' | '\\' | '$' ; Space : (' ' | '\t' | '\r' | '\n') ; NormalChar : ~SpecialChar ; 

As you noticed, in the grammar example there is a pair of (Space)* -es. This is because stringLiteral is an analyzer rule instead of a lexer rule. Therefore, when the source file is tokenized, lexer cannot know whether the space is part of a string literal or just the space inside the source file that can be ignored.

I tested the example with a small Java class, and everything worked as expected:

 /* the same grammar, but now with a bit of Java code in it */ grammar Str; @parser::header { package antlrdemo; import java.util.HashMap; } @lexer::header { package antlrdemo; } @parser::members { HashMap<String, String> vars = new HashMap<String, String>(); } parse : ((Space)* statement (Space)* ';')+ (Space)* EOF ; statement : print | assignment ; print : 'print' '(' ( id=Identifier {System.out.println("> "+vars.get($id.text));} | st=stringLiteral {System.out.println("> "+$st.value);} ) ')' ; assignment : id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);} ; stringLiteral returns [String value] : '"' {StringBuilder b = new StringBuilder();} ( id=Identifier {b.append($id.text);} | es=EscapeSequence {b.append($es.text);} | ch=(NormalChar | Space) {b.append($ch.text);} | in=Interpolation {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));} )* '"' {$value = b.toString();} ; Interpolation : '${' i=Identifier '}' ; Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ; EscapeSequence : '\\' SpecialChar ; SpecialChar : '"' | '\\' | '$' ; Space : (' ' | '\t' | '\r' | '\n') ; NormalChar : ~SpecialChar ; 

And a class with the main method for checking everything:

 package antlrdemo; import org.antlr.runtime.*; public class ANTLRDemo { public static void main(String[] args) throws RecognitionException { String source = "name = \"Bob\"; \n"+ "msg = \"Hello ${name}\"; \n"+ "print(msg); \n"+ "print(\"Bye \\${for} now!\"); "; ANTLRStringStream in = new ANTLRStringStream(source); StrLexer lexer = new StrLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); StrParser parser = new StrParser(tokens); parser.parse(); } } 

which produces the following output:

 > Hello Bob > Bye \${for} now! 

Again, I am not an expert, but this (at least) gives you a way to solve the problem.

NTN.

+12
source

All Articles