Compiler flags to improve the accuracy of intermediate floating point calculations

Question

Compiler flags to improve the accuracy of intermediate floating point calculations

Is there a gcc / clang flag that sets the precision of the floating point intermediate calculation?

Suppose I have C code

double x = 3.1415926; double y = 1.414; double z = x * y;

Is there a compiler flag that allows us to calculate "x * y" with the maximum possible precision of the user machine, say, with doubled (64-bit mantissa), and then truncate back to double (53-bit mantissa, the accuracy of the declared variable type)?

For information only, I use Ubuntu 14.04 on a 64-bit machine.

+5

c ++ c floating-point precision compilation

zell Aug 7 '16 at 15:46

source share

1 answer

Basezen · Accepted Answer · 2016-08-07T16:48:40+0000

Gcc

[Edit observed behavior of gcc 4.8.4, where the default behavior is the opposite of documentation]

You need to use 80 bit registers in x87 FPU. With -mfpmath=387 you can override the default SSE registers XMM0-XMM7. This, by default, actually gives you IEEE behavior, where 64-bit registers are used at each step.

See: https://gcc.gnu.org/wiki/x87note

Thus, by default, x87 arithmetic does not match the 64/32 bit IEEE, but receives extended precision from the x87 block. However, at any time, the value has moved from the registers to the IEEE storage of 64 or 32 bits; this 80-bit value should be rounded to the corresponding number of bits.

If your operation is extremely complex, however, a registry spill may occur; the FP register stack is only a depth of 8. Therefore, when a leak copies to the RAM cell in size, you get rounding. You need to either declare a long double yourself in this case, or round manually at the end, or check the assembler output for an explicit leak.

More about registers here: https://software.intel.com/en-us/articles/introduction-to-x64-assembly

In particular, the XMM0 ... 7 registers, while 128 bits wide, are intended only for two simultaneous 64-bit FP operations. So you want to see stack-driven FPR registers with the instructions FLD (load), FMUL (multiply) and FSTP (save and pop).

So, I compiled this code:

 double mult(double x, double y) { return x * y; }

with:

 gcc -mfpmath=387 -Ofast -o precision.s -S precision.c

And received:

 mult: .LFB24: .cfi_startproc movsd %xmm1, -8(%rsp) fldl -8(%rsp) movsd %xmm0, -8(%rsp) fldl -8(%rsp) fmulp %st, %st(1) fstpl -8(%rsp) movsd -8(%rsp), %xmm0 ret .cfi_endproc

Now everything makes sense. Floating point values are passed through the XMM0 and XMM1 registers (although they have to do weird round trips before they can be pushed onto the FPR stack), and the result is returned to XMM0 according to the Intel link above. Not sure why there is no simple FLD instruction directly from XMM0 / 1, but apparently the command set does not.

If you compare with -mfpmath=sse , in the latter case much less needs to be done, because the operands are ready and waiting in the XMM0 / 1 registers, and this is as simple as a single MULSD command.

Compiler flags to improve the accuracy of intermediate floating point calculations

More articles: