Antlr lexer characters that match similar strings, what if a greedy lexer makes a mistake?

It seems that sometimes Antlr lexer makes a bad choice, which rule to use when tokenizing the character stream ... I'm trying to figure out how to help Antlr make the right choice for a person, I want to parse the text as follows:

d/dt(x)=a
a=d/dt
d=3
dt=4

This is a sad syntax that uses an existing language, and I'm trying to write a parser. "D / dt (x)" represents the left side of the differential equation. Ignore the jargon if you need to, just know that it is not a "d" divided by a "dt". However, the second appearance of "d / dt" is indeed "d" divided by "dt".

Here is my grammar:

grammar diffeq_grammar;

program :   (statement? NEWLINE)*;

statement
    :   diffeq
    |   assignment;

diffeq  :   DDT ID ')' '=' ID;

assignment
    :   ID '=' NUMBER
    |   ID '=' ID '/' ID
    ;

DDT :   'd/dt(';
ID  :   'a'..'z'+;
NUMBER  :   '0'..'9'+;
NEWLINE :   '\r\n'|'\r'|'\n';

lexer "d/dt (" DDT. ! "d" , "/", "hmmm, "/", ". ... , "(" "(" MismatchedTokenException!

, , , , :

grammar diffeq_grammar;

program :   (statement? NEWLINE)*;

statement
    :   diffeq
    |   assignment;

diffeq  :   ddt id ')' '=' id;

assignment
    :   id '=' number
    |   id '=' id '/' id
    ;

ddt :   'd' '/' 'd' 't' '(';
id  :   CHAR+;
number  :   DIGIT+;
CHAR    :   'a'..'z';
DIGIT   :   '0'..'9';
NEWLINE :   '\r\n'|'\r'|'\n';

, , . 2 , , , ... . - Antlr lexer : DDT ID. DDT, . DDT , lexer ID.

, (.. , . ).

lexer DDT Antlr... .

- Java.

!

UPDATE

, , !! , . , , ( ), , . , ; , .

+5
3

, (.. , . ).

char -stream, , "d/dt(" .

:

grammar diffeq_grammar;

@parser::members {
  public static void main(String[] args) throws Exception {
    String src = 
        "d/dt(x)=a\n" +
        "a=d/dt\n" +
        "d=3\n" +
        "dt=4\n";
    diffeq_grammarLexer lexer = new diffeq_grammarLexer(new ANTLRStringStream(src));
    diffeq_grammarParser parser = new diffeq_grammarParser(new CommonTokenStream(lexer));
    parser.program();
  }
}

@lexer::members {
  private boolean ahead(String text) {
    for(int i = 0; i < text.length(); i++) {
      if(input.LA(i + 1) != text.charAt(i)) {
        return false;
      }
    }
    return true;
  }
}

program
 : (statement? NEWLINE)* EOF
 ;

statement
 : diffeq     {System.out.println("diffeq     : " + $text);}
 | assignment {System.out.println("assignment : " + $text);}
 ;

diffeq
 : DDT ID ')' '=' ID
 ;

assignment
 : ID '=' NUMBER
 | ID '=' ID '/' ID
 ;

DDT     : {ahead("d/dt(")}?=> 'd/dt(';
ID      : 'a'..'z'+;
NUMBER  : '0'..'9'+;
NEWLINE : '\r\n' | '\r' | '\n';

:

java -cp antlr-3.3.jar org.antlr.Tool diffeq_grammar.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar diffeq_grammarParser

( Windows : ; )

:

diffeq     : d/dt(x)=a
assignment : a=d/dt
assignment : d=3
assignment : dt=4
+5

, , , , . , , , " " . , - , ( ): , - .

, :

grammar diffeq_grammar;

program :   (statement? NEWLINE)* EOF; // <-- You forgot EOF

statement
    :   diffeq
    |   assignment;

diffeq  :   D OVER DT OPEN id CLOSE EQ id; // <-- here, id is a parser rule

assignment
    :   id EQ NUMBER
    |   id EQ id OVER id
    ;

id  : ID | D | DT; // <-- Nice trick, isn't it?

D       : 'D';
DT      : 'DT';
OVER    : '/';
EQ      : '=';
OPEN    : '(';
CLOSE   : ')';
ID      : 'a'..'z'+;
NUMBER  : '0'..'9'+;
NEWLINE : '\r\n'|'\r'|'\n';

backtracking memoization , ( ).

+3

Here is the solution I finally used. I know that this violates one of my requirements: to save the lexer rules in the lexer and parser rules in the parser, but as it turned out, moving DDT to ddt did not require changing my code. In addition, dasblinkenlight makes some remarks regarding inconsistent parentheses in his answer and comments.

grammar ddt_problem;

program :   (statement? NEWLINE)*;

statement
    :   diffeq
    |   assignment;

diffeq  :   ddt ID ')' '=' ID;

assignment
    :   ID '=' NUMBER
    |   ID '=' ID '/' ID
    ;

ddt :   ( d=ID ) { $d.getText().equals("d") }? '/' ( dt=ID ) { $dt.getText().equals("dt") }? '(';
ID  :   'a'..'z'+;
NUMBER  :   '0'..'9'+;
NEWLINE :   '\r\n'|'\r'|'\n';
+1
source

All Articles