ANTLR 4 - How to access the hidden comment channel from a user listener?

Writing a beautiful printer for legacy code in an older language. The plan is for me to study parsing and exposing before I write a translator for C ++ output. In June, I threw Java and ANTLR into the deep end, so I definitely have knowledge gaps.

I have come to the point that it is more convenient for me to write methods for my custom listener, and I also want to be able to print comments. My comments are on a separate hidden channel. Here are the grammar rules for hidden tokens:

/* Comments and whitespace -- Nested comments are allowed, each is redirected to a specific channel */ COMMENT_1 : '(*' (COMMENT_1|COMMENT_2|.)*? '*)' -> channel(1) ; COMMENT_2 : '{' (COMMENT_1|COMMENT_2|.)*? '}' -> channel(1) ; NEWLINES : [\r\n]+ -> channel(2) ; WHITESPACE : [ \t]+ -> skip ; 

I played with the Cymbol CommentShifter example on page 207 of the final ANTLR 4 Reference, and I'm trying to figure out how to adapt it to my listener methods.

 public void exitVarDecl(ParserRuleContext ctx) { Token semi = ctx.getStop(); int i = semi.getTokenIndex(); List<Token> cmtChannel = tokens.getHiddenTokensToRight(i, CymbolLexer.COMMENTS); if (cmtChannel != null) { Token cmt = cmtChannel.get(0); if (cmt != null) { String txt = cmt.getText().substring(2); String newCmt = "// " + txt.trim(); // printing comments in original format rewriter.insertAfter(ctx.stop, newCmt); // at end of line rewriter.replace(cmt, "\n"); } } } 

I adapted this example using exitEveryRule rather than exitVarDecl , and it worked for the Cymbol example, but when I adapt it to my own listener, I get a null pointer exception whether I exitEveryRule or exitSpecificThing

I look at this answer and it seems promising, but I think that what I really need is an explanation of how the data of the hidden channel is stored and how to access it. It took me several months to really get the methods and listener context in the parsing tree.

It seems that CommonTokenStream.LT() , CommonTokenStream.LA() and consume() are what I want to use, but why does SO use the completely different methods from the ANTLR example book? What should I know about token pointers or types of tokens?

I would like to better understand the logic of this.

+7
java antlr4
source share
1 answer

Okay, so I can't answer how AnTLR stores its data internally, but I can tell you how to access your hidden markers. I tested this on my computer using AnTLR v4.1 for C # .NET v4.5.2.

I have a rule that looks like this:

 LineComment : '//' ~[\r\n]* -> channel(1) ; 

In my code, I get the whole raw token stream as follows:

 IList<IToken> lTokenList = cmnTokenStream.Get( 0, cmnTokenStream.Size ); 

To check, I printed out a list of tokens using the following loop:

 foreach ( IToken iToken in lTokenList ) { Console.WriteLine( "{0}[{1}] : {2}", iToken.Channel, iToken.TokenIndex, iToken.Text ); } 

Running on this code:

 void Foo() { // comment i = 5; } 

Outputs the following result (for brevity, please assume that I have a complete grammar that also ignores spaces):

 0[0] : void 0[1] : Foo 0[2] : ( 0[3] : ) 0[4] : { 1[5] : // comment 0[6] : i 0[7] : = 0[8] : 6 0[9] : ; 0[10] : } 

You can see that channel 1 index is for only one comment token. Thus, you can use this loop to access only comment tokens:

 int lCommentCount = 0; foreach ( IToken iToken in lTokenList ) { if ( iToken.Channel == 1 ) { Console.WriteLine( "{0} : {1}", lCommentCount++, iToken.Text ); } } 

Then you can do anything with these tokens. Also works if you have multiple threads, although I would caution against using more than 65536 threads. AnTLR gave the following error when I tried to compile a grammar with a marker rule redirecting to stream index 65536:

 Serialized ATN data element out of range. 

Therefore, I assume that they use only a 16-bit unsigned integer to index the threads. Wierd.

+2
source share

All Articles