Convert mantissa and exponent to double

In a very high-performance application, we find that the processor can calculate long arithmetic much faster than doubles. However, it was found in our system that we do not need more than 9 decimal places of precision. So we use longs for all floating point arithmetic with 9-point precision.

However, in some parts of the system this is more convenient due to the readability of doubling. Therefore, we must convert between a long value, which assumes that 9 decimal places will be doubled.

We find simply taking a long one and dividing by 10 by a power of 9 or multiplying by 1 divided by 10 by a power of 9 gives inaccurate representations in double.

To solve this, we use Math.Round(value,9) to get the exact values.

However, Math.Round() is terribly slow for performance.

So, our idea at the moment is to directly convert the mantissa and exponent to the binary binary format, because - thus, the zero numbering will be rounded.

We learned online how to study the double bits to get the mantissa and exponent, but it is confusing to figure out how to cancel it to take the mantissa and exponent and generate the double using bits.

Any suggestions?

 [Test] public unsafe void ChangeBitsInDouble() { var original = 1.0D; long bits; double* dptr = &original; //bits = *(long*) dptr; bits = BitConverter.DoubleToInt64Bits(original); var negative = (bits < 0); var exponent = (int) ((bits >> 52) & 0x7ffL); var mantissa = bits & 0xfffffffffffffL; if( exponent == 0) { exponent++; } else { mantissa = mantissa | (1L << 52); } exponent -= 1075; if( mantissa == 0) { return; } while ((mantissa & 1) == 0) { mantissa >>= 1; exponent++; } Console.WriteLine("Mantissa " + mantissa + ", exponent " + exponent); } 
+8
double floating-point c # exponent mantissa
source share
2 answers

You should not use a scale factor of 10 ^ 9, instead you should use 2 ^ 30.

+1
source share

As you already understood, according to another answer, it doubles working with binary floating-point code rather than floating point, so the original approach does not work.

It is also unclear whether it can work with a deliberately simplified formula, because it is unclear what maximum range you need, so rounding becomes inevitable.

The problem is fast but definitely well understood and often supported by CPU instructions. Your only chance to defeat inline transforms:

  • You are in a mathematical breakthrough worth writing serious letters.
  • You exclude enough cases that will not occur in your own examples, because while built-in functions are better overall, yours is optimized for your own use.

If the range of values ​​you use is very limited, the potential for short cutting when converting between IEEE 754 with double precision and long integer becomes smaller and smaller.

If you are in a place where you have to cover most cases covered by IEEE 754, or even a significant part of them, then you end up doing things slower.

I would recommend either staying with what you have, moving cases where double more convenient to keep a long way, despite the inconvenience or, if necessary, using decimal . You can easily create decimal from long with:

 private static decimal DivideByBillion (long l) { if(l >= 0) return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, false, 9); l = -l; return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, true, 9); } 

Now decimal are values ​​that are slower to use in arithmetic than double (precisely because it implements an approach similar to yours in the initial question, but with a variable exponent and a larger mantissa). But if you only need a convenient way to get the value for display or rendering in a string, then manually intercepting the conversion in decimal has advantages over hacking the conversion to double , so it might be worthwhile> to look.

0
source share

All Articles