First, let me get the obvious because of this: you should use SSE instead of x87. The SSE sqrtss and sqrtsd do exactly what you want, are supported on all modern x86 systems, and also significantly faster.
Now, if you insist on using x87, I will start with the good news: you do not need to do anything for the float. To calculate a correctly rounded square root in p-bit floating point format, you need bit 2p + 2 . Since 80 > 2*24 + 2 , additional rounding to one precision will always be correctly rounded, and you have a correctly rounded square root.
Now the bad news is: 80 < 2*53 + 2 , so there is no such luck for double precision. I can offer some workarounds; here is a good light from the top of the head.
- let
y = round_to_double(x87_square_root(x)); - use the product Dekker (head-tail) to calculate
a and b so that y*y = a + b sure. - calculate the residual
r = x - a - b . if (r == 0) return yif (r > 0) , let y1 = y + 1 ulp and calculate a1 , b1 st y1*y1 = a1 + b1 . Compare r1 = x - a1 - b1 with r and return either y or y1 , depending on which one has the smallest remainder (or the one that has the least significant bit zero if the remainders are equal in magnitude).if (r < 0) , do the same for y1 = y - 1 ulp .
In this mode, only the default rounding mode is processed; however, in directional rounding modes, just rounding to the destination format does the right thing.
Stephen canon
source share