The time difference will be related to the time it takes for pow to execute, since the other code is relatively simple. You can verify this by profiling. The question then becomes, what does the compiler do to calculate the power function?
My timings: ~ 1.20 s for the Fortran version with gfortran -O3 and 1.07 s for the C ++ version compiled with g++ -O3 -ffast-math . Note that -ffast-math does not matter for gfortran , since pow will be called from the library, but it is of great importance for g++ .
In my case, for gfortran , this is the _gfortran_pow_c8_i4 function, which is called ( source code ). Their implementation is the usual way of calculating integer degrees. With g++ , on the other hand, it is a function template from the libstdC ++ library, but I do not know how this is implemented. It seems to be a little better written / optimized. I donβt know to what extent the function compiles on the fly, considering it a template. For what it's worth, the Fortran version compiled with the ifort and C ++ version compiled using icc (using the -fast optimization -fast ) gives the same timings, so I assume that they use the same library functions.
If I just write a force function in Fortran with complex arithmetic (explicitly writing down the real and imaginary parts), it is as fast as the C ++ version compiled with g++ (but then -ffast-math slows it down, so I stuck only -O3 with gfortran ):
complex(8) function pow_c8_i4(a, k) implicit none integer, intent(in) :: k complex(8), intent(in) :: a real(8) :: Re_a, Im_a, Re_pow, Im_pow, tmp integer :: i Re_pow = 1.0_8 Im_pow = 0.0_8 Re_a = real(a) Im_a = aimag(a) i = k do while (i.ne.0) if (iand(i,1).eq.1) then tmp = Re_pow Re_pow = Re_pow*Re_a-Im_pow*Im_a Im_pow = tmp *Im_a+Im_pow*Re_a end if i = ishft(i,-1) tmp = Re_a Re_a = Re_a**2-Im_a**2 Im_a = 2*tmp*Im_a end do pow_c8_i4 = cmplx(Re_pow,Im_pow,8) end function
In my experience, using explicit real and imaginary parts in Fortran implementations is faster, although it is very convenient to use complex types.
Final note: although this is just an example, the way to call the power function of each iteration is extremely inefficient. Instead, you should, of course, just multiply a by each iteration.