Parse tree generation using Java CUP

Question

Parse tree generation using Java CUP

I am using CUP with JFlex to check expression syntax. I have basic functionality: I can tell if the expression is valid or not.

The next step is to implement simple arithmetic operations, such as "add 1". For example, if my expression is "1 + a", the result should be "2 + a". I need access to the parse tree to do this because a simple definition of a numeric expression will not do this: the result of adding 1 to "(1 + a) * b" should be "(1 + a) * b + 1", and not "(2 + a) * b".

Does anyone have an example of a CUP that generates a parse tree? I think I can take it from there.

As an added bonus, is there a way to get a list of all tokens in an expression using JFlex? Seems like a typical use case, but I can't figure out how to do this.

Edit: Find a promising stack overflow hint: Create an abstract tree problem from the parser

Discussion of CUP and AST:

http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture16.4up.pdf

In particular, this paragraph:

The character returned by the parser is associated with the launch of the grammar character and contains the AST for the entire source program

This will not help. How to navigate a tree using a Symbol instance if the Symbol class does not have navigation pointers for its children? In other words, it does not look or behaves like a node tree:

package java_cup.runtime; /** * Defines the Symbol class, which is used to represent all terminals * and nonterminals while parsing. The lexer should pass CUP Symbols * and CUP returns a Symbol. * * @version last updated: 7/3/96 * @author Frank Flannery */ /* **************************************************************** Class Symbol what the parser expects to receive from the lexer. the token is identified as follows: sym: the symbol type parse_state: the parse state. value: is the lexical value of type Object left : is the left position in the original input file right: is the right position in the original input file ******************************************************************/ public class Symbol { /******************************* Constructor for l,r values *******************************/ public Symbol(int id, int l, int r, Object o) { this(id); left = l; right = r; value = o; } /******************************* Constructor for no l,r values ********************************/ public Symbol(int id, Object o) { this(id, -1, -1, o); } /***************************** Constructor for no value ***************************/ public Symbol(int id, int l, int r) { this(id, l, r, null); } /*********************************** Constructor for no value or l,r ***********************************/ public Symbol(int sym_num) { this(sym_num, -1); left = -1; right = -1; value = null; } /*********************************** Constructor to give a start state ***********************************/ Symbol(int sym_num, int state) { sym = sym_num; parse_state = state; } /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** The symbol number of the terminal or non terminal being represented */ public int sym; /*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/ /** The parse state to be recorded on the parse stack with this symbol. * This field is for the convenience of the parser and shouldn't be * modified except by the parser. */ public int parse_state; /** This allows us to catch some errors caused by scanners recycling * symbols. For the use of the parser only. [CSA, 23-Jul-1999] */ boolean used_by_parser = false; /******************************* The data passed to parser *******************************/ public int left, right; public Object value; /***************************** Printing this token out. (Override for pretty-print). ****************************/ public String toString() { return "#"+sym; } }

+4

parsing jflex grammar lexer cup

user443854 May 17, '11 at 15:33

source share

2 answers

To generate trees using the Cup, I use a modified version of the Cup, which you can find at this link: http://www.skenz.it/compilers/resources/java_cup_v10k_drawTree.zip

If you want more information about this, I suggest you visit the website of my teacher in the programming part of the course of formal languages and compilers at the Turin Polytechnic University: http://www.skenz.it/compilers/

I hope this can be useful for you! :)

0

Sarcares Sep 14 '16 at 16:58

source share

user443854 · Accepted Answer · 2011-05-20T17:17:32+0000

OK I understood. But unfortunately, I can’t post all my code here as it is. In any case, I will try to find a solution, and please ask questions if something is unclear.

JFlex uses its own Symbol class. Look here: JFlex.jar / java_cup.runtime / Symbol.class

You will see a couple of added constructors:

 public Symbol(int id, Symbol left, Symbol right, Object o){ this(id,left.left,right.right,o); } public Symbol(int id, Symbol left, Symbol right){ this(id,left.left,right.right); }

The key here is Object o , which is the value of the character.

Define your own class to represent the AST node tree, and another to represent the lexer token. Of course, you can use the same class, but I found it more understandable to use different classes to distinguish between the two. Both JFlex and CUP will generate java code, and it is easy to get your tokens and nodes.

Then in your parser.flex in the lexical rule sections you want to do something like this for each token:

 {float_lit} { return symbol(sym.NUMBER, createToken(yytext(), yycolumn)); }

Do this for all your tokens. Your createToken might look something like this:

 %{ private LexerToken createToken(String val, int start) { LexerToken tk = new LexerToken(val, start); addToken(tk); return tk; } }%

Now let's move on to parser.cup. Declare all your terminals of type LexerToken , and all of your non-terminals will be of type Node . You want to read the CUP manual, but for a quick update, the terminal will be recognized by the lexer (e.g., numbers, variables, operators), and the non-terminal will be part of your grammar (e.g., expression, coefficient, term ...).

Finally, all this is included in the definition of grammar. Consider the following example:

  factor ::= factor:f TIMES:times term:t {: RESULT = new Node(times.val, f, t, times.start); :} | factor:f DIVIDE:div term:t {: RESULT = new Node(div.val, f, t, div.start); :} | term:t {: RESULT = t; :} ;

The syntax factor:f means that you use the value of the parameter f , and you can refer to it in the next section {: ... :} . Remember that our terminals have values of type LexerToken , and our non-terminals have values of Node s.

Your term in an expression can have the following definition:

  term ::= LPAREN expr:e RPAREN {: RESULT = new Node(e.val, e.start); :} | NUMBER:n {: RESULT = new Node(n.val, n.start); :} ;

When you successfully generate the parser code, you will see in your parser.java the part where the parent-child relationship between the nodes is established:

  case 16: // term ::= UFUN LPAREN expr RPAREN { Node RESULT =null; int ufleft = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)).left; int ufright = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)).right; LexerToken uf = (LexerToken)((java_cup.runtime.Symbol) CUP$parser$stack.elementAt(CUP$parser$top-3)).value; int eleft = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-1)).left; int eright = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-1)).right; Node e = (Node)((java_cup.runtime.Symbol) CUP$parser$stack.elementAt(CUP$parser$top-1)).value; RESULT = new Node(uf.val, e, null, uf.start); CUP$parser$result = parser.getSymbolFactory().newSymbol("term",0, ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)), ((java_cup.runtime.Symbol)CUP$parser$stack.peek()), RESULT); } return CUP$parser$result;

I'm sorry that I cannot post a complete code sample, but hopefully this will save someone a few hours of trial and error. Not having complete code is also good, because it will not render all of these CS homework useless.

As proof of life, here's a pretty printed version of my AST sample.

Introductory expression:

 T21 + 1A / log(max(F1004036, min(a1, a2))) * MIN(1B, 434) -LOG(xyz) - -3.5+10 -.1 + .3 * (1)

Resulting AST:

 |--[+] |--[-] | |--[+] | | |--[-] | | | |--[-] | | | | |--[+] | | | | | |--[T21] | | | | | |--[*] | | | | | |--[/] | | | | | | |--[1A] | | | | | | |--[LOG] | | | | | | |--[MAX] | | | | | | |--[F1004036] | | | | | | |--[MIN] | | | | | | |--[A1] | | | | | | |--[A2] | | | | | |--[MIN] | | | | | |--[1B] | | | | | |--[434] | | | | |--[LOG] | | | | |--[XYZ] | | | |--[-] | | | |--[3.5] | | |--[10] | |--[.1] |--[*] |--[.3] |--[1]

Parse tree generation using Java CUP

More articles: