How do purely functional compilers annotate ASTs with type information?

Question

How do purely functional compilers annotate ASTs with type information?

In the syntax analysis phase, the imperative compiler can build the AST from nodes that already contain the type field, which was set during the construction of null , and then, at the stage of semantic analysis, fill the types by assigning the declared / supposed types to the type fields.

How do purely functional languages work where you don't have the luxury of destination? Is an AST without an AST a different type of enriched AST? Does this mean that I need to define two types for the AST node, one for the syntax phase and one for the semantic phase?

Are there purely functional programming tricks that help the compiler author with this problem?

+8

compiler-construction scala functional-programming haskell f #

fredoverflow Apr 12 '15 at 9:55

source share

4 answers

I usually rewrite the AST source (or several steps already) in a new form, replacing each expression node with a pair (tag, expression) .

Tags are unique numbers or characters that are then used by the next pass, which derives type equations from the AST. For example, a + b will give something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a). }.

Then the types of equations are solved (for example, simply by running them as a Prolog program), and, if successful, all tags (which are variables in this program) are now bound to specific types, and if not, re as type parameters.

In the next step, our previous AST is overwritten again, this time replacing the tags with all the information about the proposed type.

The whole process is a sequence of clean rewrites; you don't need to destroy anything in your AST. A typical compilation pipeline can take dozens of rewrites, some of which change the AST data type.

+6

SK-logic Apr 12 '15 at 11:45

source share

There are several modeling options for this. You can use the same type of data fields with a null value, as in your imperative case:

 data Exp = Var Name (Maybe Type) | ... parse :: String -> Maybe Exp -- types are Nothings here typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs

or even using a more accurate type

 data Exp ty = Var Name ty | ... parse :: String -> Maybe (Exp ()) typeCheck :: Exp () -> Maybe (Exp Type)

+3

chi Apr 12 '15 at 10:36

source share

I can’t say how this should be done, but I did it in F # for the C # compiler here

The approach was mainly - built the AST from the source, leaving things like information like unconstrained. So AST.fs is basically an AST that contains type names, function names, etc.

As the AST begins to compile (in this case) .NET IL, we get more information about the type (we create types in the source - allow these stub types). This then gives us the information needed to create stub labels (the code can have signatures that include stub types as well as built-in types). From here we now have enough type information to resolve any type names or method signatures in the code.

I store this in the TypedAST.fs file. I do this in one go, but the approach may be naive.

Now we have a fully typed AST, then you can do things like compile it, fully analyze it, or as you like.

So, in response to the question “Does this mean that I need to define two types for the AST node, one for the syntax phase and one for the semantic phase?”, I can’t say definitively that this is so, but it is certainly what I did, and it looks like MS did with Roslyn (although they essentially decorated the original tree with the info IIRC type)

"Are there purely functional programming tricks that help the compiler author with this problem?" Given that AST is mostly reflected in my case, one could generalize it and transform the tree, but the code can be (more) terrible.

i.e.

 type 'type AST;
 |  MethodInvoke of 'type * Name *' type list
 |  ....

+2

neil danson Apr 13 '15 at 14:48

source share

Ingo · Accepted Answer · 2015-04-19T11:14:24+0000

As with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.

In particular, there cannot be a data structure that is "AST".

Most likely, there will be data structures that represent the parsed expressions. One of the possible ways to process type information is to assign a unique identifier (for example, an integer) for each node of the tree already during parsing and have some suitable data structure (for example, a hash map) that associates these node -ids with types. Thus, specifying a type inference pass will be easy to create this map.

How do purely functional compilers annotate ASTs with type information?

More articles: