I am not an ANTLR expert, but here is a possible grammar:
grammar Str; parse : ((Space)* statement (Space)* ';')+ (Space)* EOF ; statement : print | assignment ; print : 'print' '(' (Identifier | stringLiteral) ')' ; assignment : Identifier (Space)* '=' (Space)* stringLiteral ; stringLiteral : '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"' ; Interpolation : '${' Identifier '}' ; Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ; EscapeSequence : '\\' SpecialChar ; SpecialChar : '"' | '\\' | '$' ; Space : (' ' | '\t' | '\r' | '\n') ; NormalChar : ~SpecialChar ;
As you noticed, in the grammar example there is a pair of (Space)* -es. This is because stringLiteral is an analyzer rule instead of a lexer rule. Therefore, when the source file is tokenized, lexer cannot know whether the space is part of a string literal or just the space inside the source file that can be ignored.
I tested the example with a small Java class, and everything worked as expected:
grammar Str; @parser::header { package antlrdemo; import java.util.HashMap; } @lexer::header { package antlrdemo; } @parser::members { HashMap<String, String> vars = new HashMap<String, String>(); } parse : ((Space)* statement (Space)* ';')+ (Space)* EOF ; statement : print | assignment ; print : 'print' '(' ( id=Identifier {System.out.println("> "+vars.get($id.text));} | st=stringLiteral {System.out.println("> "+$st.value);} ) ')' ; assignment : id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);} ; stringLiteral returns [String value] : '"' {StringBuilder b = new StringBuilder();} ( id=Identifier {b.append($id.text);} | es=EscapeSequence {b.append($es.text);} | ch=(NormalChar | Space) {b.append($ch.text);} | in=Interpolation {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));} )* '"' {$value = b.toString();} ; Interpolation : '${' i=Identifier '}' ; Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ; EscapeSequence : '\\' SpecialChar ; SpecialChar : '"' | '\\' | '$' ; Space : (' ' | '\t' | '\r' | '\n') ; NormalChar : ~SpecialChar ;
And a class with the main method for checking everything:
package antlrdemo; import org.antlr.runtime.*; public class ANTLRDemo { public static void main(String[] args) throws RecognitionException { String source = "name = \"Bob\"; \n"+ "msg = \"Hello ${name}\"; \n"+ "print(msg); \n"+ "print(\"Bye \\${for} now!\"); "; ANTLRStringStream in = new ANTLRStringStream(source); StrLexer lexer = new StrLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); StrParser parser = new StrParser(tokens); parser.parse(); } }
which produces the following output:
> Hello Bob > Bye \${for} now!
Again, I am not an expert, but this (at least) gives you a way to solve the problem.
NTN.
source share