What is the cause of the disastrous performance of pow() for NaN values? As far as I can decide , NaNs should not affect performance if floating point math is done using SSE instead of x87 FPU.
This is similar to elementary operations, but not pow() . I compared the multiplication and division of a double by a square, and then took the square root. If I compile the code snippet below using g++ -lrt , I get the following result:
multTime(3.14159): 20.1328ms multTime(nan): 244.173ms powTime(3.14159): 92.0235ms powTime(nan): 1322.33ms
As expected, calculations involving NaN take significantly longer. Compiling with g++ -lrt -msse2 -mfpmath=sse however leads to the following times:
multTime(3.14159): 22.0213ms multTime(nan): 13.066ms powTime(3.14159): 97.7823ms powTime(nan): 1211.27ms
NaN multiplication / division is now much faster (actually faster than with a real number), but squaring and taking the square root still takes a lot of time.
Test code (compiled with gcc 4.1.2 on 32-bit OpenSuSE 10.2 in VMWare, CPU - Core i7-2620M)
#include <iostream> #include <sys/time.h> #include <cmath> void multTime( double d ) { struct timespec startTime, endTime; double durationNanoseconds; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime); for(int i=0; i<1000000; i++) { d = 2*d; d = 0.5*d; } clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime); durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec); std::cout << "multTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl; } void powTime( double d ) { struct timespec startTime, endTime; double durationNanoseconds; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime); for(int i=0; i<1000000; i++) { d = pow(d,2); d = pow(d,0.5); } clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime); durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec); std::cout << "powTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl; } int main() { multTime(3.14159); multTime(NAN); powTime(3.14159); powTime(NAN); }
Edit:
Unfortunately, my knowledge on this topic is extremely limited, but I think that glibc pow() never uses SSE in a 32-bit system, but rather build in sysdeps/i386/fpu/e_pow.S . There is a __ieee754_pow_sse2 function in later versions of glibc, but it is in sysdeps/x86_64/fpu/multiarch/e_pow.c and therefore probably only works on x64. However, all of this may not be relevant here, since pow() also a built-in gcc function . For easy correction see Z bosonic answer .