Pow (NAN) is very slow

What is the cause of the disastrous performance of pow() for NaN values? As far as I can decide , NaNs should not affect performance if floating point math is done using SSE instead of x87 FPU.

This is similar to elementary operations, but not pow() . I compared the multiplication and division of a double by a square, and then took the square root. If I compile the code snippet below using g++ -lrt , I get the following result:

 multTime(3.14159): 20.1328ms multTime(nan): 244.173ms powTime(3.14159): 92.0235ms powTime(nan): 1322.33ms 

As expected, calculations involving NaN take significantly longer. Compiling with g++ -lrt -msse2 -mfpmath=sse however leads to the following times:

 multTime(3.14159): 22.0213ms multTime(nan): 13.066ms powTime(3.14159): 97.7823ms powTime(nan): 1211.27ms 

NaN multiplication / division is now much faster (actually faster than with a real number), but squaring and taking the square root still takes a lot of time.

Test code (compiled with gcc 4.1.2 on 32-bit OpenSuSE 10.2 in VMWare, CPU - Core i7-2620M)

 #include <iostream> #include <sys/time.h> #include <cmath> void multTime( double d ) { struct timespec startTime, endTime; double durationNanoseconds; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime); for(int i=0; i<1000000; i++) { d = 2*d; d = 0.5*d; } clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime); durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec); std::cout << "multTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl; } void powTime( double d ) { struct timespec startTime, endTime; double durationNanoseconds; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime); for(int i=0; i<1000000; i++) { d = pow(d,2); d = pow(d,0.5); } clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime); durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec); std::cout << "powTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl; } int main() { multTime(3.14159); multTime(NAN); powTime(3.14159); powTime(NAN); } 

Edit:

Unfortunately, my knowledge on this topic is extremely limited, but I think that glibc pow() never uses SSE in a 32-bit system, but rather build in sysdeps/i386/fpu/e_pow.S . There is a __ieee754_pow_sse2 function in later versions of glibc, but it is in sysdeps/x86_64/fpu/multiarch/e_pow.c and therefore probably only works on x64. However, all of this may not be relevant here, since pow() also a built-in gcc function . For easy correction see Z bosonic answer .

+8
c ++ performance nan pow
source share
4 answers

"NaNs should not affect performance if floating point math is done using SSE instead of x90 FPU."

I am not sure if this follows from the resource you are quoting. In any case, pow is a library function of C. It is not implemented as an instruction, even on x87. Thus, there are two separate questions: how does SSE process NaN values ​​and how pow function implement NaN values?

If the pow function implementation uses a different path for special values, such as +/-Inf or NaN , you can expect that the NaN value for the base or exponent will quickly return a value. On the other hand, the implementation may not treat this as a separate case and simply relies on floating point operations to propagate intermediate results as NaN values.

Starting with Sandy Bridge, many denormal performance penalties have been reduced or eliminated. Not all, although the author describes the penalty for mulps . Therefore, it would be reasonable to expect that not all arithmetic operations with NaNs are “fast”. Some architectures may even revert to microcode for processing NaNs in different contexts.

+8
source share

Your math library is too old. Either find another math library that implements pow better with NAN or implements such a fix:

 inline double pow_fix(double x, double y) { if(x!=x) return x; if(y!=y) return y; return pow(x,y); } 

Compile with g++ -O3 -msse2 -mfpmath=sse foo.cpp .

+3
source share

If you want to make a square or take a square root, use d*d or sqrt(d) . pow(d,2) and pow(d,0.5) will be slower and possibly less accurate if your compiler does not optimize them based on constant second arguments 2 and 0.5; note that such optimization is not always possible for pow(d,0.5) , since it returns 0.0 if d is a negative zero and sqrt(d) returns -0.0.

For those who do timings, make sure you test the same thing.

+2
source share

Using a complex function such as pow (), there are many ways that NaN can cause slowness. Maybe NaN operations are slow, or it may be that the pow () implementation checks all kinds of special values ​​that it can handle efficiently, and NaN values ​​fail in all of these tests, which leads to a more expensive path being taken. You will need to go through the code to find out for sure.

A later implementation of pow () may include additional checks for more efficient NaN management, but this is always a compromise - it would be shameful for pow () to handle “normal” cases more slowly to speed up NaN processing.

My blog post only applies to individual instructions, not complex functions like pow ().

+2
source share

All Articles