Okay, so I can't answer how AnTLR stores its data internally, but I can tell you how to access your hidden markers. I tested this on my computer using AnTLR v4.1 for C # .NET v4.5.2.
I have a rule that looks like this:
LineComment : '//' ~[\r\n]* -> channel(1) ;
In my code, I get the whole raw token stream as follows:
IList<IToken> lTokenList = cmnTokenStream.Get( 0, cmnTokenStream.Size );
To check, I printed out a list of tokens using the following loop:
foreach ( IToken iToken in lTokenList ) { Console.WriteLine( "{0}[{1}] : {2}", iToken.Channel, iToken.TokenIndex, iToken.Text ); }
Running on this code:
void Foo() {
Outputs the following result (for brevity, please assume that I have a complete grammar that also ignores spaces):
0[0] : void 0[1] : Foo 0[2] : ( 0[3] : ) 0[4] : { 1[5] :
You can see that channel 1 index is for only one comment token. Thus, you can use this loop to access only comment tokens:
int lCommentCount = 0; foreach ( IToken iToken in lTokenList ) { if ( iToken.Channel == 1 ) { Console.WriteLine( "{0} : {1}", lCommentCount++, iToken.Text ); } }
Then you can do anything with these tokens. Also works if you have multiple threads, although I would caution against using more than 65536 threads. AnTLR gave the following error when I tried to compile a grammar with a marker rule redirecting to stream index 65536:
Serialized ATN data element out of range.
Therefore, I assume that they use only a 16-bit unsigned integer to index the threads. Wierd.