How to manage extra space in ANTLR?

Question

How to manage extra space in ANTLR?

I am trying to parse a data file in ANTLR - it has optional spaces, illustrated

3 6 97 12 15 18

The following shows where the line begins and ends. At the end there is a new line and no tabs.

 ^ 3 6$ ^ 97 12$ ^ 15 18$ ^

My grammar:

 lines : line+; line : ws1 {System.out.println("WSOPT :"+$ws1.text+":");} num1 {System.out.println("NUM1 "+$num1.text);} ws2 {System.out.println("WS :"+$ws2.text+":");} num2 {System.out.println("NUM2 "+$num2.text);} NEWLINE ; num1 : INT ; num2 : INT ; ws1 : WSOPT; ws2 : WS; INT : '0'..'9'+; NEWLINE : '\r'? '\n'; //WS : (' '|'\t' )+ ; WS : (' ')+ ; WSOPT : (' ')* ;

which gives

 line 1:0 mismatched input ' ' expecting WSOPT WSOPT :null: NUM1 3 WS : : NUM2 6 line 2:0 mismatched input ' ' expecting WSOPT WSOPT :null: NUM1 97 WS : : NUM2 12 BUILD SUCCESSFUL (total time: 1 second)

(i.e., the leading WS was not recognized and the last line was skipped).

I would like to parse lines that start without spaces, for example:

 ^12 34$ ^ 23 97$

but I get errors like:

 line 1:0 required (...)+ loop did not match anything at input ' '

I would appreciate a general explanation of WS parsing in ANTLR.

EDIT @jitter has a useful answer - {ignore=WS} does not appear in the Defining ANTLR Link book I am working on, so this is clearly a complex area.

HELP still needed I changed this to:

 lines : line line line; line options { ignore=WS; } : ws1 {System.out.println("WSOPT :"+$ws1.text+":");} num1 {System.out.println("NUM1 "+$num1.text);} ws2 {System.out.println("WS :"+$ws2.text+":");} num2 {System.out.println("NUM2 "+$num2.text);} NEWLINE ;

but get an error:

 illegal option ignore

EDIT , obviously this has been removed from V3: http://www.antlr.org/pipermail/antlr-interest/2007-February/019423.html

+6

java whitespace antlr

peter.murray.rust Oct 31 '09 at 12:02

source share

3 answers

 WS : (' ' | '\t')+ {$channel = HIDDEN;} ;

+8

Sam harwell Nov 10 '09 at 5:13

source share

Mark Lexical Analysis with ANTLR , and then search for the part that starts with this heading.

Ignore spaces in the lexer

You need to use the rule { ignore=WS; } { ignore=WS; }

+2

jitter Oct 31 '09 at 12:23

source share

peter.murray.rust · Accepted Answer · 2009-11-10T05:09:46+0000

I managed to get this working using lexer constructs such as:

 WS : (' ')+ {skip();}; WSOPT : (' ')* {skip();};

but not in NEWLINE. Then in parser constructs such as:

 num1 num2 NEWLINE;

The key was to delete all WS in the lexer, except NEWLINE.

How to manage extra space in ANTLR?

More articles: