ANTLR lexer can't look at all

Question

ANTLR lexer can't look at all

I have the following grammar:

rule: 'aaa' | 'a' 'a';

It can successfully parse the string "aaa", but cannot parse "aa" with the following error:

 line 1:2 mismatched character '<EOF>' expecting 'a'

FYI, this is a lexer problem, not a parser, because I don't even call the parser. The main function looks like this:

 @members { public static void main(String[] args) throws Exception { RecipeLexer lexer = new RecipeLexer(new ANTLRInputStream(System.in)); for (Token t = lexer.nextToken(); t.getType() != EOF; t = lexer.nextToken()) System.out.println(t.getType()); } }

The result is the same with the more obvious version:

 rule: AAA | AA; AAA: 'aaa'; A: 'a';

Obviously, lexer ANTLR is trying to map the input "aa" to the AAA rule, which fails. In addition, ANTLR is an LL (*) parser or something else, the lexer should work separately from the analyzer and should be able to resolve ambiguity. Grammar works fine with good old lex (or flex), but it doesn't look like ANTLR. So what's the problem?

Thanks for the help!

+4

lex antlr antlr3 lexer

Kj Aug 30 '12 at 5:38

source share

1 answer

Bart kiers · Accepted Answer · 2012-08-30T06:44:54+0000

The ANTLR-created parsers are (or may be) LL (*), not its lexers.

When the lexer sees the input "aa" , it tries to match the AAA token. When he does not, he tries to match any other token that also matches "aa" (the lexer does not back down to match A !). Since this is not possible, an error occurs.

This is usually not a problem, since in practice some kind of identifier rule "aa" can often be returned. So, what is the real problem you are trying to solve, or were you just curious about the inner workings? If this is the first, edit your question and describe your actual problem.

ANTLR lexer can't look at all

More articles: