Abstract tree syntax construction and traversal

I do not understand the structure of abstract syntax trees. To go "down" (forward) "in the source of the program that AST represents, do you go directly to the very top of the node, or do you go down? For example, will there be an example program

a = 1 b = 2 c = 3 d = 4 e = 5 

The result is in AST, which is as follows: enter image description here

or that: enter image description here

Where in the first, the “right” on the main node will push you through the program, but in the second, just following the next pointer on each node will do the same.

It seems that the second will be more correct, since you do not need something like a special type of node with a potentially extremely long array of pointers for the very first node. Although, I see that the second one becomes more complicated than the first when you get into for and if loops and more complicated things.

+7
source share
5 answers

The first view is more typical, although the second is compatible with constructing a tree as a recursive data structure that can be used when the implementation platform is functional rather than imperative.

Consider:

enter image description here

This is your first example, with the exception of the abbreviated and “main” node (conceptual straw) of the more appropriate name “block” to reflect the general construction of the “block” containing a sequence of statements in an imperative programming language. Different types of nodes have different types of children, and sometimes these children include collections of auxiliary nodes, the order of which is important, as is the case with the “block”. The same thing can happen, for example, when initializing an array:

 int[] arr = {1, 2} 

Consider how this can be represented in the syntax tree:

enter image description here

Here, array-literal-type node also has several children of the same type, whose order is important.

+5
source

Where in the first, going "to the right", on the main node, you will advance you through the program, but in the second one just following the pointer on each node will do the same.

It seems that the second will be more correct, since you do not need something like a special type of node with a potentially extremely long array of pointers for the very first node

I would almost always prefer the first approach, and I think it will be much easier for you to build your AST if you do not need to maintain a pointer to the next node.

I think that, as a rule, it is easiest to deduce all objects from a common base class, similarly to this:

 abstract class Expr { } class Block : Expr { Expr[] Statements { get; set; } public Block(Expr[] statements) { ... } } class Assign : Expr { Var Variable { get; set; } Expr Expression { get; set; } public Assign(Var variable, Expr expression) { ... } } class Var : Expr { string Name { get; set; } public Variable(string name) { ... } } class Int : Expr { int Value { get; set; } public Int(int value) { ... } } 

The resulting AST is as follows:

 Expr program = new Block(new Expr[] { new Assign(new Var("a"), new Int(1)), new Assign(new Var("b"), new Int(2)), new Assign(new Var("c"), new Int(3)), new Assign(new Var("d"), new Int(4)), new Assign(new Var("e"), new Int(5)), }); 
+4
source

It depends on the language. In C, you will need to use the first form to capture the concept of a block, since a block has a variable scope:

 { { int a = 1; } // a doesn't exist here } 

The scope of the variable will be an attribute of what you call the "main node".

+1
source

I believe your first version makes more sense for several reasons.

Firstly, the first more clearly demonstrates the "nesting" of the program, and is also clearly implemented as a root tree (which is the usual concept of a tree).

The second and more important reason is that your “main node” could indeed be a “branch branch” (for example), which might just be another node in a larger AST. Thus, your AST can be viewed in a recursive sense, where each AST is a node with other ASTs as children. This makes the design of the former much simpler, more general and very uniform.

+1
source

Suggestion: when working with tree data structures, getter is an AST or another type associated with the compiler, it always uses one "root" node, it can help you perform operations and have more control:

 class ASTTreeNode { bool isRoot() {...} string display() { ... } // ... } void main () { ASTTreeNode MyRoot = new ASTTreeNode(); // ... // prints the root node, plus each subnode recursively MyRoot.Show(); } 

Greetings.

0
source

All Articles