Abstract syntax tree

I am currently working on a compiler for C, and I'm lost in the part where we build the data structure for the AST, especially for the part where we build the structure for identifiers, it is called "Entry Table of Symbol"

I see structures above the network, such as:

struct ste { struct id *name; /* pointer into hash table for assoc. id */ struct decl *decl; /* pointer into symbol table for its decl */ struct ste *prev; /* pointer to previous entry in symbol table */ }; 

It looks like a linked list because it contains a pointer to the previous entry (* prev), but what is the logic behind this?

+4
source share
3 answers

The answer to your specific question is this: the prev link means that when your code has a pointer to one of these node, it can follow the link to the previous link in the chain. One of the reasons a character table can have such a list is to deal with a nested area:

 { int x; { int x; } } 

However, there are many, many other reasons why character nodes can be ordered in a list. Any reason the compiler should visit all nodes is the reason.

+8
source

You have long seen the remnants of addiction for C programmers: it is assumed that characters will be on some lists, and instead of highlighting list structures separately, list pointers are included as part of the character structure. This trick saves one distribution in the list element, but at a cost: the set of lists on which the symbol can be included is fixed, and this structure confuses programmers. If the application is a compiler, there is no reason to use this trick. It is much clearer to have a separate list structure, which is defined something like this:

 struct ste_list { struct ste *symbol_table_entry; struct str_list *next; }; 

You can have as much as you like, and no one is wiser. And the internal pointers that you find obscure go away.

You're asking

What is the logic behind this?

Part of the answer is that it is useful to have characters in the highlighted list. I can’t answer the question definitively without knowing more about a specific compiler. It’s best to assume that the prev entry will be used to implement nested areas (brackets { ... } in C), but this assumption is based on the compilers I saw or worked. Thus, perhaps the logic is that when a trailing parenthesis occurs, the compiler can follow this link until it reaches ste in the enclosing area. People with a little more experience than the author of the compiler you are studying usually put this logic in a “symbol table abstraction,” which will include functions such as enterscope() and exitscope() , and the details of these operations will be hidden from the inside represent individual character table entries.

+2
source

My first thought on using a linked list with a reverse direction would be for those languages ​​that support variable name redefinition, for example:

 int main (void) { int x = 1; int y = 1; if (x == 1) { int y = 2; printf ("y = %d\n", y); } return 0; } 

In this case, you want to access the variable with the innermost scope (the latter is defined). This can be found by walking backward through the list (provided that you create the list by going forward, of course).

Then, when the scope disappears, you can also simply adjust the "head" pointer to remove the last added variables.

Of course, you can achieve the same effect by inserting it in front of the current voice, and not adding it to the end of the list (which looks conceptually what is being done, only with the prev pointer instead of next ).

+1
source

All Articles