ANTLR4: ignore spaces in input, but not in string literals

I have a simple grammar as follows:

grammar SampleConfig; line: ID (WS)* '=' (WS)* string; ID: [a-zA-Z]+; string: '"' (ESC|.)*? '"' ; ESC : '\\"' | '\\\\' ; // 2-char sequences \" and \\ WS: [ \t]+ -> skip; 

Input spaces are completely ignored, including in the string literal.

 final String input = "key = \"value with spaces in between\""; final SampleConfigLexer l = new SampleConfigLexer(new ANTLRInputStream(input)); final SampleConfigParser p = new SampleConfigParser(new CommonTokenStream(l)); final LineContext context = p.line(); System.out.println(context.getChildCount() + ": " + context.getText()); 

Prints the following output:

 3: key="valuewithspacesinbetween" 

But I expected that white spaces in the string literal would be preserved, i.e.

 3: key="value with spaces in between" 

Is it possible to correct the grammar to achieve this behavior, or should I just override CommonTokenStream to ignore spaces during the parsing process?

+5
source share
1 answer

You should not expect any spaces in the parser rules, since you are missing them in your lexer.

Either remove the skip command or make the string rule lexer:

 STRING : '"' ( '\\' [\\"] | ~[\\"\r\n] )* '"'; 
+4
source

Source: https://habr.com/ru/post/1210945/


All Articles