How do C ++ compilers optimize template code?

How do compilers avoid linear growth in the size of the compiled binary with each instance of a new template type?

I don’t see how to avoid creating a copy of the entire boilerplate code when using a new instance.

I believe that compilation time and binary sizes would be extremely cumbersome for everyone except the simplest templates in a fairly large code base. But their prevalence suggests that compilers can do some magic to make them practical.

+8
c ++ templates
source share
3 answers

Many of the template functions are small enough to embed them effectively, so you get linear growth in binary format - but this is nothing more than what you get with equivalent functions without templates.

The rule of one definition is important here because it allows the compiler to assume that any instance of a template with the same template parameters generates identical code. If it discovers that the template function has already been created earlier in the source file, it can use this copy instead of generating a new one. Manipulating the name allows the linker to recognize the same function from different compiled sources. None of this is guaranteed, since your program should not show the difference between identical copies of the function, but compilers do more complex optimizations than this every day.

Once, when duplicates should be filtered out, when the function contains a static variable - there can be only one copy. But this can be achieved either by filtering duplicate functions, or by filtering the static variables themselves.

+6
source share

There are several things that make multiple instances not too harmful for a blurry size:

  • Many templates simply pass things on to another layer. Although there can be quite a lot of code, it basically disappears when the code is created and embedded. Note that nesting [and doing some optimizations] can easily lead to larger code. Please note that nesting small functions often leads to smaller (and faster) code (mainly because otherwise the required call sequence often requires more instructions than what is built-in, and the optimizer is more likely to reduce the code even more a more holistic view of what is happening on).
  • If the template template is not built-in, duplication of instances in different translation units must be combined into only one instance. I am not a specialist in linkers, but I understand that, for example, ELF uses different partitions, and the linker can select only those partitions that are actually used.
  • In large executables, you will need some types of dictionaries and instances that have been used in many places and distributed efficiently. Doing everything using a custom type would be a bad idea, and erasing a type is certainly an important tool to avoid too many types.

However, where possible, it pays out pre-templates, especially if there are only a small number of instances that are commonly used. An excellent example is the IOStreams library, which is unlikely to be used with more than 4 types (as a rule, it is used with only one): moving template definitions and their instances into separate translation units may not reduce the size of the executable file, but it will certainly reduce the time compilation! Starting with C ++ 11, you can declare template instances as extern , which allows you to define definitions without being implicitly created by specializations that are known to be created elsewhere.

+5
source share

I think you do not understand how templates are implemented. Templates are compiled based on the need for the appropriate class / function.

Consider the following code ...

 template <typename Type> Type mymax(Type a, Type b) { return a > b ? a : b; } int main(int argc, char** argv) { } 

Compiling this, I get the following assembly.

  .file "example.cpp" .text .globl main .type main, @function main: .LFB1: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl %edi, -4(%rbp) movq %rsi, -16(%rbp) movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1" .section .note.GNU-stack,"",@progbits 

You will notice that it contains only the main function. Now I am updating my code to use the template function.

 int main(int argc, char** argv) { mymax<double>(3,4); } 

The compilation is that I get a much longer collective output, including a template function for handling doubles. The compiler saw that the template function was used as a double, so a function was created to handle this case.

  .file "example.cpp" .text .globl main .type main, @function main: .LFB1: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $32, %rsp movl %edi, -4(%rbp) movq %rsi, -16(%rbp) movabsq $4616189618054758400, %rdx movabsq $4613937818241073152, %rax movq %rdx, -24(%rbp) movsd -24(%rbp), %xmm1 movq %rax, -24(%rbp) movsd -24(%rbp), %xmm0 call _Z5mymaxIdET_S0_S0_ movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1: .size main, .-main .section .text._Z5mymaxIdET_S0_S0_,"axG",@progbits,_Z5mymaxIdET_S0_S0_,comdat .weak _Z5mymaxIdET_S0_S0_ .type _Z5mymaxIdET_S0_S0_, @function _Z5mymaxIdET_S0_S0_: .LFB2: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movsd %xmm0, -8(%rbp) movsd %xmm1, -16(%rbp) movsd -8(%rbp), %xmm0 ucomisd -16(%rbp), %xmm0 jbe .L9 movq -8(%rbp), %rax jmp .L6 .L9: movq -16(%rbp), %rax .L6: movq %rax, -24(%rbp) movsd -24(%rbp), %xmm0 popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE2: .size _Z5mymaxIdET_S0_S0_, .-_Z5mymaxIdET_S0_S0_ .ident "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1" .section .note.GNU-stack,"",@progbits 

Now let me say that I am changing the code to use this function twice.

 int main(int argc, char** argv) { mymax<double>(3,4); mymax<double>(4,5); } 

Again, look at the assembly it creates. This is comparable to the previous release, because most of this code was just a compiler creating the mymax function, where the "Type" is changed to double. No matter how many times I use this function, it will be declared only once.

  .file "example.cpp" .text .globl main .type main, @function main: .LFB1: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $32, %rsp movl %edi, -4(%rbp) movq %rsi, -16(%rbp) movabsq $4616189618054758400, %rdx movabsq $4613937818241073152, %rax movq %rdx, -24(%rbp) movsd -24(%rbp), %xmm1 movq %rax, -24(%rbp) movsd -24(%rbp), %xmm0 call _Z5mymaxIdET_S0_S0_ movabsq $4617315517961601024, %rdx movabsq $4616189618054758400, %rax movq %rdx, -24(%rbp) movsd -24(%rbp), %xmm1 movq %rax, -24(%rbp) movsd -24(%rbp), %xmm0 call _Z5mymaxIdET_S0_S0_ movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1: .size main, .-main .section .text._Z5mymaxIdET_S0_S0_,"axG",@progbits,_Z5mymaxIdET_S0_S0_,comdat .weak _Z5mymaxIdET_S0_S0_ .type _Z5mymaxIdET_S0_S0_, @function _Z5mymaxIdET_S0_S0_: .LFB2: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movsd %xmm0, -8(%rbp) movsd %xmm1, -16(%rbp) movsd -8(%rbp), %xmm0 ucomisd -16(%rbp), %xmm0 jbe .L9 movq -8(%rbp), %rax jmp .L6 .L9: movq -16(%rbp), %rax .L6: movq %rax, -24(%rbp) movsd -24(%rbp), %xmm0 popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE2: .size _Z5mymaxIdET_S0_S0_, .-_Z5mymaxIdET_S0_S0_ .ident "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1" .section .note.GNU-stack,"",@progbits 

Thus, basically templates do not affect the size of exec more than writing functions manually. It is just a convenience. The compiler will create a function for one or more uses of a given type, so if I use it 1 or 1000 times, it will only be one instance. Now, if I update my code and process a new type of type float, I will get one more function in my executable file, but only one, no matter how many times I use this function.

+3
source share

All Articles