What can make C ++ RTTI undesirable?

In the LLVM documentation, they mention that they use the "custom RTTI form" , and for this reason they have isa<> , cast<> and dyn_cast<> template functions.

Usually reading that a library implements some of the basic functionality of the language is a terrible smell of code and just invites you to run. However, this is the LLVM we are talking about: the guys are working on the C ++ compiler and the C ++ runtime. If they don’t know what they are doing, I messed up pretty much because I prefer clang to the gcc version that comes with Mac OS.

Nevertheless, being less experienced than them, I can only wonder what the pitfalls of a normal RTTI are. I know that it only works for types that have a v-table, but this raises only two questions:

  • Since you just need a virtual method to create a vtable, why not just mark the method as virtual ? Virtual destructors seem to be good at this.
  • If their solution does not use regular RTTI, any idea how it was implemented?
+60
c ++ rtti llvm
Feb 27 '11 at 18:20
source share
4 answers

There are several reasons why LLVM launches its own RTTI system. This system is simple and powerful and is described in the LLVM Programmer's Guide . As another poster noted, coding standards poses two main problems with C ++ RTTI: 1) the cost of space and 2) the low performance of using it.

The bulk cost of RTTI is quite high: each class with vtable (at least one virtual method) receives RTTI information, which includes the name of the class and information about its base classes. This information is used to implement the typeid operator as well as dynamic_cast . Since this cost is paid for each class using vtable (and no, PGO and connection time optimization do not help, because vtable indicates RTTI information) LLVM builds with -fno-rtti. Empirically, this saves about 5-10% of the executable size, which is quite significant. LLVM does not need the equivalent of typeid, so storing names (among other things in type_info) for each class is just a waste of space.

Poor performance is pretty simple if you benchmark or look at the code generated for simple operations. The LLVM is & <> operator is usually compiled to a single load and compared to a constant (although classes control this based on how they implement their class method). Here is a trivial example:

 #include "llvm/Constants.h" using namespace llvm; bool isConstantInt(Value *V) { return isa<ConstantInt>(V); } 

Compiled for:

 $ clang t.cc -S -o - -O3 -I $ HOME / llvm / include -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -mkernel -fomit-frame-pointer
 ...
 __Z13isConstantIntPN4llvm5ValueE:
     cmpb $ 9, 8 (% rdi)
     sete% al
     movzbl% al,% eax
     ret

which (if you are not reading the assembly) is a load and is compared with a constant. In contrast, the equivalent with dynamic_cast:

 #include "llvm/Constants.h" using namespace llvm; bool isConstantInt(Value *V) { return dynamic_cast<ConstantInt*>(V) != 0; } 

which compiles to:

 clang t.cc -S -o - -O3 -I $ HOME / llvm / include -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -mkernel -fomit-frame-pointer
 ...
 __Z13isConstantIntPN4llvm5ValueE:
     pushq% rax
     xorb% al,% al
     testq% rdi,% rdi
     je LBB0_2
     xorl% esi,% esi
     movq $ -1,% rcx
     xorl% edx,% edx
     callq ___dynamic_cast
     testq% rax,% rax
     setne% al
 LBB0_2:
     movzbl% al,% eax
     popq% rdx
     ret

This is a lot more code, but the killer is a __dynamic_cast call, which then needs to understand the RTTI data structures and make a very general, dynamically calculated pass through this material. This is several orders of magnitude slower than load and comparison.

Alright, alright, so it's slower, why does it matter? This matters because LLVM performs many type checks. Many parts of optimizers are built around a template that matches specific constructs in the code and performs permutations on them. For example, here is some code for matching a simple pattern (which already knows that Op0 / Op1 are the left and right sides of an integer subtraction operation):

  // (X*2) - X -> X if (match(Op0, m_Mul(m_Specific(Op1), m_ConstantInt<2>()))) return Op1; 

The match operator and m_ * are pattern templates that come down to a series of isa / dyn_cast calls, each of which must perform a type check. Using dynamic_cast for this kind of fine-grained pattern matching would be fierce and exponentially slow.

Finally, there is another point, which is one of expressiveness. the different rtti operators used by LLVM are used to express different things: type checking, dynamic_cast, forced (statement), null reference, etc. C ++ dynamic_cast does not offer (initially) any of these functions.

After all, there are two ways to look at this situation. On the other hand, C ++ RTTI is too narrowly defined for what many people want (full reflection), and too slow to be useful even for simple things like LLVM. On the plus side, the C ++ language is powerful enough so that we can define abstractions like this, like library code, and refuse to use the language function. One of my favorite things about C ++ is how powerful and elegant libraries are. RTTI is not even very high among my least favorite C ++ features :)!

-Chris

+72
Feb 28 '11 at 6:36
source share

LLVM coding standards seem to answer this question quite well:

To reduce the size of the code and executable, LLVM does not use RTTI (e.g. dynamic_cast <>) or exceptions. These two features of the language violate the general C ++ principle "you pay only for what you use", causing the executable file to grow, even if exceptions are not used in the code base or RTTI is never used for the class. Because of this, we turn them off globally in code.

Thus, LLVM makes extensive use of the manual RTTI form, which uses patterns such as isa <>, cast <>, and dyn_cast <>. This form of RTTI is an option and can be added to any class. It is also significantly more efficient than dynamic_cast <>.

+15
Feb 27 '11 at 18:41
source share

Here 's a great article on RTTI and why you might need to flip your own version.

I am not an expert in C ++ RTTI, but I also implemented my own RTTI, because there are certain reasons why you need to do this. Firstly, the C ++ RTTI system is not very rich in functionality, basically all you can do is type cast and get basic information. What if at runtime you have a line with the class name and you want to build an object of this class, good luck with C ++ RTTI. In addition, C ++ RTTI is not (or easily) modularly portable (you cannot identify the class of an object that was created from another module (dll / so or exe). Similarly, the C ++ RTTI implementation is specific to the compiler and usually it is expensive in terms of the extra overhead to implement it for all types, and finally it is not very stable, so it cannot really be used to save / load files (for example, you can save object data to a file, but you also want to save "typeid" of its class, h So that at boot time you knew which object to create for loading this data, which cannot be done reliably using C ++ RTTI.) For all or some of these reasons, many frameworks have their own RTTI (from very simple to very rich functions). : wxWidget, LLVM, Boost.Serialization, etc. This is really not so rare.

Since you need a virtual method to work with the vtable, why not just mark the method as virtual? Virtual destructors seem to be good at this.

It is likely that RTTI uses them. Virtual functions are the basis for dynamic binding (runtime binding), and therefore, it is mainly required for any type of identification / information such as runtime (and not just for C ++ RTTI, but any RTTI implementation will have one way or another rely on virtual calls).

If their solution does not use regular RTTI, any idea how it was implemented?

Of course, you can look for RTTI implementations in C ++. I made my own, and there are many libraries that also have their own RTTI. Actually quite simple to write. Basically, all you need is a means of unambiguously representing the type (i.e., the name of the class or some modified version of it or even a unique identifier for each class), some structure similar to type_info , which contains all the information about the type that you need, then you need a "hidden" virtual function in each class that will return this information on request (if this function is overridden in each derived class, it will work). There are, of course, some additional things that can be done, for example, a single-user repository of all types, possibly with associated factory functions (this can be useful for creating type objects when everything that is known at runtime is type name, as a string or type identifier). In addition, you can add some virtual functions for a dynamic tick (this is usually done by calling the translation function of the derived class itself and executing static_cast to the type you want to attribute).

+9
Feb 27 '11 at 18:57
source share

The predominant reason is that they are struggling to minimize memory usage.

RTTI is only available for classes that contain at least one virtual method, which means that class instances will contain a pointer to the virtual table.

In the 64-bit architecture (which is common today), one pointer has 8 bytes. Since the compiler creates many small objects, it adds up pretty quickly.

Thus, constant efforts are made to remove virtual functions as much as possible (and practical) and implement what would be virtual functions with the switch instruction, which has a similar execution speed, but significantly reduces the memory effect.

Their constant concern about memory consumption has paid off since Clang consumes significantly less memory than gcc, for example, which is important when you offer a library to clients.

On the other hand, this also means that adding a new node type usually leads to editing the code in a large number of files, because each switch needs to be adapted (fortunately, compilers give a warning if you skip the enumeration member in the switch). So they decided to make maintenance even more difficult in the name of memory efficiency.

+3
Feb 27 '11 at 19:52
source share



All Articles