As for the standard C library sqrt and pow , the answer is no .
First, if pow(x, .5f) faster than the sqrt(x) implementation, the engineer assigned to support sqrt would replace the pow(x, .5f) .
Secondly, sqrt implementation in commercial libraries is usually optimized specifically for this task, often by people who are knowledgeable about writing high-performance software and who write in or near assembly language to get the maximum performance available for the processor.
Thirdly, many processors have instructions to execute sqrt or to facilitate its calculation. (Typically, there is an instruction to provide an estimate of the inverse square root and instructions to refine this estimate.)
but
The code you linked / set that you specified is an attempt to roughly approximate sqrt using roughly approximated pow .
I converted the final version of the subroutine approach mentioned in the question to C and measured its execution time when calculating pow(3, .5) . I also measured the runtime of the system (Mac OS X 10.8) pow and sqrt and the sqrt approximation here (with one iteration and multiplication by the argument to end to get the square root, not its inverse).
First, the calculated results: Field approximation returns 1.72101. The sqrt approximation returns 1.73054. The correct value returned by the pow and sqrt system is 1.73205.
Running in 64-bit mode on MacPro4,1, approximating the field takes about 6 cycles, the pow system takes 29 cycles, the square root approximation takes 10 cycles, and the sqrt system takes 29 cycles. These times may include some overhead for loading the arguments and storing the results (I used mutable variables to prevent the compiler from optimizing otherwise useless loop iterations so that I could measure them).
(These times are βeffective bandwidth,β essentially the number of processor cycles that one call starts when another can start.)
Eric Postpischil
source share