How to get full user statements (including spaces) in ANTLR

I have a definition of "statement" from a Java language definition as follows.

statement : block | ASSERT expression (':' expression)? ';' | 'if' parExpression statement ('else' statement)? | 'for' '(' forControl ')' statement | 'while' parExpression statement | 'do' statement 'while' parExpression ';' | 'try' block ( catches 'finally' block | catches | 'finally' block ) | 'switch' parExpression switchBlock | 'synchronized' parExpression block | 'return' expression? ';' | 'throw' expression ';' | 'break' Identifier? ';' | 'continue' Identifier? ';' | ';' | statementExpression ';' | Identifier ':' statement ; 

When you execute the parser, I also want to print full user statements (by entering spaces in the statements), for example:

 Object o = Ma.addToObj(r1); if(h.isFull() && !h.contains(true)) h.update(o); 

But when I use the "getText ()" function in "exitStatement", I can only get instructions with all spaces removed, for example:

 Objecto=Ma.addToObj(r1); if(h.isFull()&&!h.contains(true))h.update(o); 

How can I get full user statements (embedding spaces in statements) in a simple way? Many thanks!

Full codes:

 public class PrintStatements { public static class GetStatements extends sdlParserBaseListener { StringBuilder statements = new StringBuilder(); public void exitStatement(sdlParserParser.StatementContext ctx){ statements.append(ctx.getText()); statements.append("\n"); } } public static void main(String[] args) throws Exception{ String inputFile = null; if ( args.length>0 ) inputFile = args[0]; InputStream is = System.in; if ( inputFile!=null ) { is = new FileInputStream(inputFile); } ANTLRInputStream input = new ANTLRInputStream(is); sdlParserLexer lexer = new sdlParserLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); sdlParserParser parser = new sdlParserParser(tokens); ParseTree tree = parser.s(); // create a standard ANTLR parse tree walker ParseTreeWalker walker = new ParseTreeWalker(); // create listener then feed to walker GetStatements loader = new GetStatements(); walker.walk(loader, tree); // walk parse tree System.out.println(loader.statements.toString()); } } 
+4
source share
4 answers

I solved this problem using tokens.getText () at the top level of the instruction, for example:

 public void exitE(sdlParserParser.EContext ctx) { TokenStream tokens = parser.getTokenStream(); String Stmt = null; Stmt = tokens.getText(ctx.statement()); ... } 
+6
source

I'm new to ANTLR, so maybe I'm something wrong ...

I do not know how to do this, but you can try something like this. Your grammar file probably has something like this:

 WS : (' '|'\r'|'\t'|'\u000C'|'\n') { if (!preserveWhitespacesAndComments) { skip(); } else { $channel = HIDDEN; } } 

This lexer rule tells the parser to ignore spaces. More precisely, these tokens are sent via the HIDDEN channel (the parser does not see them). If you comment on these lines of code

 WS : (' '|'\r'|'\t'|'\u000C'|'\n') { if (!preserveWhitespacesAndComments) { // skip(); } else { // $channel = HIDDEN; } } 

all spaces will be sent to the parser, but then you need to rewrite the parser rules so that it can expect spaces anywhere.

 Object(EXPECT WHITESPACE)o(EXPECT WHITESPACE)=(EXPECT WHITESPACE)Ma.addToObj(r1); 

Otherwise, the parser will report errors.

0
source

You need one of two things:

  • The ability to take file position data for the first and last tokens received as a result of parsing (either tokens or tree nodes), and go to the source file and extract the text. This will give you the original spaces.
  • A nice printer that will recover text from AST by inserting appropriate spaces. See my SO answer on how to create a pretty <here.
0
source

in terms of Antlr4 and Python3, the code is as follows:

 def exitSomeDecl(self, ctx: yourParser.SomeDeclContext): start_index = ctx.start.tokenIndex stop_index = ctx.stop.tokenIndex user_text = self.token_stream.getText(interval=(start_index, stop_index)) 

assigned here during init:

  input_stream = FileStream(file_name) lexer = sdplLexer(input_stream) token_stream = CommonTokenStream(lexer) 
0
source

All Articles