C ++ performance level affects Python Swig module performance

I have a great Python Swig module. The C ++ wrapper ends with approximately 320,000 LoC (including the headers I assume). I am currently compiling this with -O1, and g ++ creates a 44MiB binary and takes about 3 minutes to compile it.

If I turn off optimization (-O0), the binary will exit at 40 MB, and it will take 44 seconds to compile.

Is compiling a shell with -O0 significantly degrading the performance of the python module? Before I go into the module’s performance profile at different optimization levels, did anyone do this analysis before or had any idea if this matters?

+4
source share
2 answers

-O0 deactivates all optimizations performed by gcc. And optimization matters.

Thus, without a lot of knowledge in your application, I could assume that it would hurt the performance of your application.

The usually safe level of optimization to use is -O2.

You can check what optimizations GCC does at: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html .

But in the end, if you want to know for sure, you must compile at different levels and profiles.

+3
source

This is bad, regardless of the SWIG modules or not. There are many optimizations that happen even with gcc -O1 that you will skip if you prevent them.

You can check the difference by checking the asm generated by your compiler. Of these, those that I know trivially will be detrimental to the created SWIG shell:

  • Dead Code Exception:

     void foo() { int a = 1; a = 0; } 

    With -O1, this completely meaningless code is completely removed:

     foo: pushl %ebp movl %esp, %ebp popl %ebp ret 

    whereas with -O0 it becomes:

     foo: pushl %ebp movl %esp, %ebp subl $16, %esp movl $1, -4(%ebp) movl $0, -4(%ebp) leave ret 
  • Register allocation will negatively affect functions with a large number of local variables - most SWIG wrapper functions will see a hit from this. It's hard to show a brief example of this, though.

  • Another example: output from gcc compiling a SWIG shell for a prototype:

     int foo(unsigned int a, unsigned int b, unsigned int c, unsigned int d); 

    Creates with -O0 :

     Java_testJNI_foo: pushl %ebp movl %esp, %ebp subl $88, %esp movl 16(%ebp), %eax movl %eax, -48(%ebp) movl 20(%ebp), %eax movl %eax, -44(%ebp) movl 24(%ebp), %eax movl %eax, -56(%ebp) movl 28(%ebp), %eax movl %eax, -52(%ebp) movl 32(%ebp), %eax movl %eax, -64(%ebp) movl 36(%ebp), %eax movl %eax, -60(%ebp) movl 40(%ebp), %eax movl %eax, -72(%ebp) movl 44(%ebp), %eax movl %eax, -68(%ebp) movl $0, -32(%ebp) movl -48(%ebp), %eax movl %eax, -28(%ebp) movl -56(%ebp), %eax movl %eax, -24(%ebp) movl -64(%ebp), %eax movl %eax, -20(%ebp) movl -72(%ebp), %eax movl %eax, -16(%ebp) movl -16(%ebp), %eax movl %eax, 12(%esp) movl -20(%ebp), %eax movl %eax, 8(%esp) movl -24(%ebp), %eax movl %eax, 4(%esp) movl -28(%ebp), %eax movl %eax, (%esp) call foo movl %eax, -12(%ebp) movl -12(%ebp), %eax movl %eax, -32(%ebp) movl -32(%ebp), %eax leave ret 

    Compared to -O1 , which only generates:

     Java_testJNI_foo: pushl %ebp movl %esp, %ebp subl $24, %esp movl 40(%ebp), %eax movl %eax, 12(%esp) movl 32(%ebp), %eax movl %eax, 8(%esp) movl 24(%ebp), %eax movl %eax, 4(%esp) movl 16(%ebp), %eax movl %eax, (%esp) call foo leave ret 
  • With -O1 g ++ can generate much smarter code for:

     %module test %{ int value() { return 100; } %} %feature("compactdefaultargs") foo; %inline %{ int foo(int a=value(), int b=value(), int c=value()) { return 0; } %} 

Short answer with optimizations completely disabled. GCC generates extremely naive code - this is true for SWIG wrappers, like any other program, if not more specified style of automatically generated code.

+2
source

All Articles