ANTLR Parser with manual lexer

Question

ANTLR Parser with manual lexer

I am migrating a C # programming language compiler from manual lexer / parser to Antlr.

Antlr gave me severe headaches because it usually works mostly, but then there are small parts that do not and are incredibly painful to solve.

I found that most of my headaches are caused by Antlr lexer parts, not a parser. Then I noticed parser grammar X; and realized that maybe I could have a hand-written lexer and then generated by the Antlr parser.

So, I am looking for additional documentation on this topic. I think a custom ITokenStream might work, but there seems to be virtually no online documentation on this topic ...

+8

c # antlr lexer parser-generator

luiscubal Dec 10 '10 at 23:21

source share

1 answer

luiscubal · Accepted Answer · 2010-12-11T01:48:29+0000

I learned how to do it. This may not be the best approach, but it seems to work.

Antlr guerrillas receive ITokenStream parameter
Antlr Lexers themselves ITokenSource s
ITokenSource - a significantly simpler interface than ITokenStream
The easiest way to convert ITokenSource to ITokenStream is to use CommonSourceStream , which receives the ITokenSource parameter

So, now we only need to do 2 things:

Adjust grammar for parser only
ITokenSource implementation

Grammar setup is very simple. Just remove all lexer declarations and make sure you declare the grammar as a parser grammar . A simple example is posted here for convenience:

 parser grammar mygrammar; options { language=CSharp2; } @parser::namespace { MyNamespace } document: (WORD {Console.WriteLine($WORD.text);} | NUMBER {Console.WriteLine($NUMBER.text);})*;

Note that the following file will output class mygrammar instead of class mygrammarParser .

So now we want to implement a “fake” lexer. I personally used the following pseudo code:

 TokenQueue q = new TokenQueue(); //Do normal lexer stuff and output to q CommonTokenStream cts = new CommonTokenStream(q); mygrammar g = new mygrammar(cts); g.document();

Finally, we need to define a TokenQueue . TokenQueue not strictly necessary, but I used it for convenience. It must have methods for obtaining lexer tokens and methods for outputting Antlr tokens. Therefore, if you do not use your own Antlr tokens, you need to implement the conversion method to Antlr-token. In addition, TokenQueue must implement ITokenSource .

Keep in mind that it is very important to set token variables correctly. Initially, I had some problems because I was calculating CharPositionInLine . If these variables are set incorrectly, then the analyzer may fail. In addition, the normal channel (not hidden) is 0.

It seems to work for me so far. I hope others find this helpful. I am open to feedback. In particular, if you find a better way to solve this problem, feel free to post a separate answer.

ANTLR Parser with manual lexer

More articles: