Token Return Values ​​in ANTLR 3 C

I am new to ANTLR and I am trying to write a simple parser using the C language target (antler3C). The grammar is simple enough that I want each rule to return a value, for example:

number returns [long value]
 :
 ( INT {$value = $INT.ivalue;}
 | HEX {$value = $HEX.hvalue;}
 ) 
 ; 

HEX returns [long hvalue] 
    : '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+  {$hvalue = strtol((char*)$text->chars,NULL,16);}
    ;

INT returns [long ivalue] 
    : '0'..'9'+    {$ivalue = strtol((char*)$text->chars,NULL,10);}
    ;

Each rule collects the return value of its child rules until the top rule returns a good structure full of my data.

As far as I can tell, ANTLR allows lexer rules (tokens like "INT" and "HEX") to return values ​​exactly like parser rules (like "number"). However, the generated C code will not compile:

error C2228: left of '.ivalue' must have class/struct/union
error C2228: left of '.hvalue' must have class/struct/union

, - ANTLR3_COMMON_TOKEN_struct, . , C . , , , , , , .

, : " antler3C lexer-, , ?"

+5
3

- , , @bemace.

, lexer . . 4.3 ANTLR:


, ANTLR . lexer ANTLR [...]


:

1

long number:

number returns [long value]
  :  INT {$value = Long.parseLong($INT.text);}
  |  HEX {$value = Long.parseLong($HEX.text.substring(2), 16);}
  ;

2

, , , toLong(): long:

import org.antlr.runtime.*;

public class YourToken extends CommonToken {

  public YourToken(CharStream input, int type, int channel, int start, int stop) {
    super(input, type, channel, start, stop);
  }

  // your custom method
  public long toLong() {
    String text = super.getText();
    int radix = text.startsWith("0x") ? 16 : 10;
    if(radix == 16) text = text.substring(2);
    return Long.parseLong(text, radix);
  }
}

options {...} emit(): Token lexer:

grammar Foo;

options{
  TokenLabelType=YourToken;
}

@lexer::members {
  public Token emit() {
    YourToken t = new YourToken(input, state.type, state.channel, 
        state.tokenStartCharIndex, getCharIndex()-1);
    t.setLine(state.tokenStartLine);
    t.setText(state.text);
    t.setCharPositionInLine(state.tokenStartCharPositionInLine);
    emit(t);
    return t;
  }
}

parse
  :  number {System.out.println("parsed: "+$number.value);} EOF
  ;

number returns [long value]
  :  INT {$value = $INT.toLong();}
  |  HEX {$value = $HEX.toLong();}
  ;

HEX
  :  '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+
  ;

INT
  :  '0'..'9'+
  ;

:

import org.antlr.runtime.*;
import java.io.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("0xCafE");
        FooLexer lexer = new FooLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        FooParser parser = new FooParser(tokens);
        parser.parse();
    }
}

:

parsed: 51966

.

, , , Java. , 2 C target/runtime. - , , SO.

+8

Lexer Token, Parser. , .

social_title returns [Name.Title title]
 : SIR { title = Name.Title.SIR; }
 | 'Dame' { title = Name.Title.DAME; }
 | MR { title = Name.Title.MR; }
 | MS { title = Name.Title.MS; }
 | 'Miss' { title = Name.Title.MISS; }
 | MRS { title = Name.Title.MRS; };
+1

There is a third option: you can pass the object as an argument to the lexer rule. This object contains an element representing the return value of lexer. Within the lexer rule, you can set a member. Beyond the lexer rule, the moment you call it, you can get a member and do whatever you want with this "return value". This method of passing parameters corresponds to the "var" parameters in Pascal or the "out" parameters in C ++ and other programming languages.

+1
source

All Articles