How classes are implemented in compilers

I would like to implement a class type for my own little language, but what, in my opinion, at first would not be too heavy, made me think. I have a parser in place, and this is the side of code generation that I'm having problems with. Can someone shed some light on the best / right way to do this? In particular, I would like to do this in LLVM, so when I need to know about the general features of this, any specific LLVM code I have to work on will be fantastic.

Thanks T.


NB. The experience that I have with LLVM mainly lies in what comes from the Kaleidoscope tutorials and a little more from playing with it, but I am far from a complete understanding of the LLVM API.

+7
compiler-construction language-design llvm
source share
3 answers

Very, very incomplete review:

A class is a structure (you know C / C ++, right?)

Methods are ordinary functions, except that they receive an additional implicit argument: the object itself. This argument is usually called 'this' or "self" inside the function. Scope symbols may (C ++, JavaScript) or may not (PHP, Python) be accessible by default within methods.

Inheritance essentially smoothes structures and, possibly, also joins symbol tables, since usually the characters of the base class are accessible by default from the methods of the class that you are currently parsing. When you come across a character (field or method) inside a method, you need to do an upward search, starting from the current class, ascending the hierarchy. Or you can implement it so that you view it in only one character table, which is the result of a merge.

Virtual methods are called indirectly. In some languages, by default, all methods are virtual. The implementation will depend on whether it is a fully dynamic language, in which case you always look at the function name inside the class at runtime, and therefore all your methods become virtual automatically; or in the case of static languages, compilers usually create so-called virtual method tables. I'm not sure you need this at all, so I will not go into details here.

Constructors are special methods that are called either when building a new object (usually using the "new" one), or are otherwise called part of the constructor's call chain from within the child constructors. Many different implementations are possible here, one of which is that the constructor takes an implicit argument of 'this', which can be NULL if the object has not yet been created, and returns it as well.

Destructors are ordinary methods that are usually called implicitly when an object goes out of scope. Again, you need to consider the possibility of an upstream call chain for destructors.

Interfaces are complex unless your language is fully dynamic.

+6
source share

You have to buy Stan Lippmann, Inside The C ++ Object Model. All you need is there.

+4
source share

Perhaps there are several strategies for implementing this, here is one of them:

A vtable (virtual table) is a structure with constant compilation with function pointers. (All values ​​are known at compile time.)

(You can call the pointer to the vtable "interface" if you want.

An OOP class in a language without any inheritance ability is a structure that contains a const pointer to its vtable as the first member variable. This pointer is used to accurately identify the type of object and with multiple inheritance aspect / representation (like what is cast?) On this object.

If you want to have multiple inheritance, then you need (static_) to refer the pointer to the derived class to its parent class, correcting the byte address on the fly. This can be implemented using a single virtual function or (better) with the offset value stored in the vtable.

A (dynamic_) passed from a pointer to a parent class to a pointer to a derived class implies a search in probably a large data structure (array, hash table, whatever) or is implemented through a single virtual function.

Each function call from the vtable needs a pointer object, which must be introduced into a type that is suitable for this function. This can be done either by the caller and by reading the signed offset (correlation with the function) from the vtable, or by the called party, which then is only a proxy source function.

In some languages ​​(especially functional languages), you can define references to (untyped) objects that initialize the list of interfaces / types valid for this object. Such a link contains one pointer to the base object and a list of pointers to the corresponding vtables.

+1
source share

All Articles