A simple test case between clang ++ / g ++ / gfortran

Question

A simple test case between clang ++ / g ++ / gfortran

I came across this question on scicomp , which involves calculating the sum. There you can see C ++ and a similar implementation of fortran . Interestingly, the fortran version was about 32% faster.

I thought I was not sure of their results and tried to restore the situation. Here are a few (very few) different codes that I ran:

C ++

#include <iostream> #include <complex> #include <cmath> #include <iomanip> int main () { const double alpha = 1; std::cout.precision(16); std::complex<double> sum = 0; const std::complex<double> a = std::complex<double>(1,1)/std::sqrt(2.); for (unsigned int k=1; k<10000000; ++k) { sum += std::pow(a, k)*std::pow(k, -alpha); if (k % 1000000 == 0) std::cout << k << ' ' << sum << std::endl; } return 0; }

Fortran

 implicit none integer, parameter :: dp = kind(0.d0) complex(dp), parameter :: i_ = (0, 1) real(dp) :: alpha = 1 complex(dp) :: s = 0 integer :: k do k = 1, 10000000 s = s + ((i_+1)/sqrt(2._dp))**k * k**(-alpha) if (modulo(k, 1000000) == 0) print *, k, s end do end

Compile the codes above using gcc 4.6.3 and clang 3.0 on an Ubuntu 12.04 LTS machine, all with the -O3 flag. Here are my timings:

 time ./a.out

gfortran

 real 0m1.538s user 0m1.536s sys 0m0.000s

g ++

 real 0m2.225s user 0m2.228s sys 0m0.000s

clank

 real 0m1.250s user 0m1.244s sys 0m0.004s

Interestingly, I also see that fortran code is faster than c++ about the same 32% when using gcc . However, using clang , I can see that c++ really works faster by about 19%. Here are my questions:

Why is g ++ generated code slower than gfortran? Since they are from the same compiler family, does this (for this) mean fortran code can simply be converted to faster code? As a rule, does this apply to fortran vs C ++?
Why does clang work so well? Is there a fortran front-end for the llvm compiler? If there, will the code generated by this be even faster?

UPDATE:

Using the -ffast-math -O3 parameters generates the following results:

gfortran

 real 0m1.515s user 0m1.512s sys 0m0.000s

g ++

 real 0m1.478s user 0m1.476s sys 0m0.000s

clank

 real 0m1.253s user 0m1.252s sys 0m0.000s

The Npw g++ version works like a fast gfortran , and clang faster than both. Adding -fcx-fortran-rules to the above parameters does not significantly change the results.

+8

c ++ gcc fortran clang llvm

Gradguy May 19, '13 at 21:18

source share

2 answers

varepsilon · Answer 1 · 2013-05-22T08:15:39+0000

I believe that your problem is the output part. It is well known that C ++ streams ( std::cout ) are often very inefficient. While different compilers can optimize this, it is always recommended to rewrite critical working parts using the C function printf instead of std::cout .

steabert · Answer 2 · 2013-05-22T13:15:30+0000

The time difference will be related to the time it takes for pow to execute, since the other code is relatively simple. You can verify this by profiling. The question then becomes, what does the compiler do to calculate the power function?

My timings: ~ 1.20 s for the Fortran version with gfortran -O3 and 1.07 s for the C ++ version compiled with g++ -O3 -ffast-math . Note that -ffast-math does not matter for gfortran , since pow will be called from the library, but it is of great importance for g++ .

In my case, for gfortran , this is the _gfortran_pow_c8_i4 function, which is called ( source code ). Their implementation is the usual way of calculating integer degrees. With g++ , on the other hand, it is a function template from the libstdC ++ library, but I do not know how this is implemented. It seems to be a little better written / optimized. I don’t know to what extent the function compiles on the fly, considering it a template. For what it's worth, the Fortran version compiled with the ifort and C ++ version compiled using icc (using the -fast optimization -fast ) gives the same timings, so I assume that they use the same library functions.

If I just write a force function in Fortran with complex arithmetic (explicitly writing down the real and imaginary parts), it is as fast as the C ++ version compiled with g++ (but then -ffast-math slows it down, so I stuck only -O3 with gfortran ):

 complex(8) function pow_c8_i4(a, k) implicit none integer, intent(in) :: k complex(8), intent(in) :: a real(8) :: Re_a, Im_a, Re_pow, Im_pow, tmp integer :: i Re_pow = 1.0_8 Im_pow = 0.0_8 Re_a = real(a) Im_a = aimag(a) i = k do while (i.ne.0) if (iand(i,1).eq.1) then tmp = Re_pow Re_pow = Re_pow*Re_a-Im_pow*Im_a Im_pow = tmp *Im_a+Im_pow*Re_a end if i = ishft(i,-1) tmp = Re_a Re_a = Re_a**2-Im_a**2 Im_a = 2*tmp*Im_a end do pow_c8_i4 = cmplx(Re_pow,Im_pow,8) end function

In my experience, using explicit real and imaginary parts in Fortran implementations is faster, although it is very convenient to use complex types.

Final note: although this is just an example, the way to call the power function of each iteration is extremely inefficient. Instead, you should, of course, just multiply a by each iteration.

A simple test case between clang ++ / g ++ / gfortran

More articles: