Fast pow (x, 0.5f) implementation faster than fast sqrt (x)?

Question

Fast pow (x, 0.5f) implementation faster than fast sqrt (x)?

I am wondering if a quick implementation of pow (), like this one , is a faster way to get the square root of an integer than a quick sqrt (x). We know that

sqrt(x) = pow(x, 0.5f)

I can not check the speed myself, because I did not find a quick sqrt implementation. My question is: Is fast implementation of pow (x, 0.5f) faster than fast sqrt (x)?

Edit: I meant powf-pow, which accepts float instead of doubles. (doubles are more misleading)

+7

c ++ performance c math

Zaffy Aug 4 '12 at 17:43

source share

3 answers

The results that execute the following code in 64-bit MSVC ++ 2013 mode, full optimization. ~ 9X for sqrt ();

Distance 2619435809228.278300

Pow () duration was 18413.000000 milliseconds

Distance 2619435809228.278300

Sqrt () elapsed time was 2002.000000 milliseconds

 #define LOOP_KNT 249000000 // (SHRT_MAX * 1024) int main(void) { time_t start = clock(); double distance = 0, result = 0; start = clock(); for(int i=0; i<LOOP_KNT; i++) { result = pow(i, 0.50); distance += result; } printf("\nDistance is %f", distance); printf("\nPow() elapsed time was %f milliseconds", (double)clock() - (double)(start)); distance = 0, result = 0; start = clock(); for(int i=0; i<LOOP_KNT; i++) { result = sqrt(i); distance += result; } printf("\nDistance is %f", distance); printf("\nSqrt() elapsed time was %f milliseconds", (double)clock() - (double)(start)); printf("\nHit any key to end program.\n"); getchar(); return 0; }

No manual, theorizing or pontificating is required. Just write a test and see the result.

+3

user2548100 Feb 07 '14 at 20:50

source share

In the general case, given the same error limits, a more specific problem can be more optimized than a more general one.

Therefore, you can take this algorithm and replace b with a constant of 0.5, and now you have sqrt (), which is at least as fast as this pow (). Now that it is permanent, the compiler (or person) can do optimizations based on this.

Note that the pow () function is approximate and has a (relatively) large error, and therefore is not as accurate as, say, most sqrt library functions. If you relax the sqrt implementation within the same approximation limits, you can really do it at least as fast.

+1

Max Aug 4 '12 at 17:51

source share

Eric Postpischil · Accepted Answer · 2012-08-04T17:52:25+0000

As for the standard C library sqrt and pow , the answer is no .

First, if pow(x, .5f) faster than the sqrt(x) implementation, the engineer assigned to support sqrt would replace the pow(x, .5f) .

Secondly, sqrt implementation in commercial libraries is usually optimized specifically for this task, often by people who are knowledgeable about writing high-performance software and who write in or near assembly language to get the maximum performance available for the processor.

Thirdly, many processors have instructions to execute sqrt or to facilitate its calculation. (Typically, there is an instruction to provide an estimate of the inverse square root and instructions to refine this estimate.)

but

The code you linked / set that you specified is an attempt to roughly approximate sqrt using roughly approximated pow .

I converted the final version of the subroutine approach mentioned in the question to C and measured its execution time when calculating pow(3, .5) . I also measured the runtime of the system (Mac OS X 10.8) pow and sqrt and the sqrt approximation here (with one iteration and multiplication by the argument to end to get the square root, not its inverse).

First, the calculated results: Field approximation returns 1.72101. The sqrt approximation returns 1.73054. The correct value returned by the pow and sqrt system is 1.73205.

Running in 64-bit mode on MacPro4,1, approximating the field takes about 6 cycles, the pow system takes 29 cycles, the square root approximation takes 10 cycles, and the sqrt system takes 29 cycles. These times may include some overhead for loading the arguments and storing the results (I used mutable variables to prevent the compiler from optimizing otherwise useless loop iterations so that I could measure them).

(These times are “effective bandwidth,” essentially the number of processor cycles that one call starts when another can start.)

Fast pow (x, 0.5f) implementation faster than fast sqrt (x)?

but

More articles: