OK I understood. But unfortunately, I canβt post all my code here as it is. In any case, I will try to find a solution, and please ask questions if something is unclear.
JFlex uses its own Symbol class. Look here: JFlex.jar / java_cup.runtime / Symbol.class
You will see a couple of added constructors:
public Symbol(int id, Symbol left, Symbol right, Object o){ this(id,left.left,right.right,o); } public Symbol(int id, Symbol left, Symbol right){ this(id,left.left,right.right); }
The key here is Object o , which is the value of the character.
Define your own class to represent the AST node tree, and another to represent the lexer token. Of course, you can use the same class, but I found it more understandable to use different classes to distinguish between the two. Both JFlex and CUP will generate java code, and it is easy to get your tokens and nodes.
Then in your parser.flex in the lexical rule sections you want to do something like this for each token:
{float_lit} { return symbol(sym.NUMBER, createToken(yytext(), yycolumn)); }
Do this for all your tokens. Your createToken might look something like this:
%{ private LexerToken createToken(String val, int start) { LexerToken tk = new LexerToken(val, start); addToken(tk); return tk; } }%
Now let's move on to parser.cup. Declare all your terminals of type LexerToken , and all of your non-terminals will be of type Node . You want to read the CUP manual, but for a quick update, the terminal will be recognized by the lexer (e.g., numbers, variables, operators), and the non-terminal will be part of your grammar (e.g., expression, coefficient, term ...).
Finally, all this is included in the definition of grammar. Consider the following example:
factor ::= factor:f TIMES:times term:t {: RESULT = new Node(times.val, f, t, times.start); :} | factor:f DIVIDE:div term:t {: RESULT = new Node(div.val, f, t, div.start); :} | term:t {: RESULT = t; :} ;
The syntax factor:f means that you use the value of the parameter f , and you can refer to it in the next section {: ... :} . Remember that our terminals have values ββof type LexerToken , and our non-terminals have values ββof Node s.
Your term in an expression can have the following definition:
term ::= LPAREN expr:e RPAREN {: RESULT = new Node(e.val, e.start); :} | NUMBER:n {: RESULT = new Node(n.val, n.start); :} ;
When you successfully generate the parser code, you will see in your parser.java the part where the parent-child relationship between the nodes is established:
case 16:
I'm sorry that I cannot post a complete code sample, but hopefully this will save someone a few hours of trial and error. Not having complete code is also good, because it will not render all of these CS homework useless.
As proof of life, here's a pretty printed version of my AST sample.
Introductory expression:
T21 + 1A / log(max(F1004036, min(a1, a2))) * MIN(1B, 434) -LOG(xyz) - -3.5+10 -.1 + .3 * (1)
Resulting AST:
|--[+] |--[-] | |--[+] | | |--[-] | | | |--[-] | | | | |--[+] | | | | | |--[T21] | | | | | |--[*] | | | | | |--[/] | | | | | | |--[1A] | | | | | | |--[LOG] | | | | | | |--[MAX] | | | | | | |--[F1004036] | | | | | | |--[MIN] | | | | | | |--[A1] | | | | | | |--[A2] | | | | | |--[MIN] | | | | | |--[1B] | | | | | |--[434] | | | | |--[LOG] | | | | |--[XYZ] | | | |--[-] | | | |--[3.5] | | |--[10] | |--[.1] |--[*] |--[.3] |--[1]