Decrease floating point range

Question

Decrease floating point range

I am implementing 32-bit floating point trigonometry in C # using Mono, hopefully using Mono.Simd. Currently, I just do not see a solid range. I was rather stuck because, apparently, Mono SIMD extensions does not include conversions between floats and integers, which means that I have no access to rounding / truncation, which would be a normal method. However, I can convert bitwise between int and float.

Is it possible to do something like this? I can scale the area up and down if necessary, but ideally, reducing the range should result in the domain [0, 2 pi] or [-pi, pi]. I have a suspicion that some IEEE magic could be done with a metric if the domain has a capacity of 2, but I'm really not sure how to do this.

Edit: Well, I tried to communicate with this C code, and it seems to me that I'm on the verge of something (it does not work, but the fractional part is always correct, in the decimal system / base 10 at least ...). The basic principle, apparently, is to get the difference between the exhibitors between your domain and the input rate and compose a new float with a shifted mantissa and an adjusted exponent. But this will not work for the negatives, and I don’t know how to handle inability 2 (or something fractional - in fact, nothing but 2 does not work!).

// here another more correct attempt: float fmodulus(float val, int domain) { const int mantissaMask = 0x7FFFFF; const int exponentMask = 0x7F800000; int ival = *(int*)&val; int mantissa = ival & mantissaMask; int rawExponent = ival & exponentMask; int exponent = (rawExponent >> 23) - (129 - domain); // powers over one: int p = exponent; mantissa <<= p; rawExponent = exponent >> p; rawExponent += 127; rawExponent <<= 23; int newVal = rawExponent & exponentMask; newVal |= mantissa & mantissaMask; float ret = *(float*)&newVal; return ret; } float range_reduce(float value, int range ) { const int mantissaMask = 0x7FFFFF; const int exponentMask = 0x7F800000; int ival = *(int*)&value; // grab exponent: unsigned exponent = (ival & exponentMask) >> 23; // grab mantissa: unsigned mantissa = ival & mantissaMask; // remove bias, and see how much the exponent is over range/domain unsigned char erange = (unsigned char)(exponent - (125 + range)); // check if sign bit is set - that is, the exponent is under our range if (erange & 0x80) { // don't do anything then. erange = 0; } // shift mantissa (and chop off bits) by the reduced amount int inewVal = (mantissa << (erange)) & mantissaMask; // add exponent, and subtract the amount we reduced the argument with inewVal |= ((exponent - erange) << 23) & exponentMask; // reinterpret float newValue = *(float*)&inewVal; return newValue; //return newValue - ((erange) & 0x1 ? 1.0f : 0.0f); } int main() { float val = 2.687f; int ival = *(int*)&val; float correct = fmod(val, 2); float own = range_reduce(val, 2); getc(stdin); }

Edit 2:

Well, I'm really trying to figure this out from the perspective of the ieee binary system. If we write a module operation as follows:

 output = input % 2 [exponent] + [mantissa_bit_n_times_exponent] 3.5 = [2] + [1 + 0.5] ->[1] + [0.5] = 1.5 4.5 = [4] + [0 + 0 + 0.5] ->[0.5] + [0] = 0.5 5.5 = [4] + [0 + 1 + 0.5] ->[1] + [0.5] = 1.5 2.5 = [2] + [0 + 0.5] ->[0.5] + [0] = 0.5 2.25 = [2] + [0 + 0 + 0.25] ->[0.25] = 0.25 2.375 = [2] + [0 + 0 + 0.25 + 0.125] ->[0.25] + [0.125] = 0.375 13.5 = [8] + [4 + 0 + 1 + 0.5] ->[1] + [0.5] = 1.5 56.5 = [32] + [16 + 8 + 0 + 0 + 0 + 0.5] ->[0.5] = 0.5

We see that in all cases the output is a new number without the original exponent, and the mantissa has shifted the sum (based on the exponent and the first nonzero bits of the mantissa after the first bits of the exponent of the mantissa is ignored) to the exponent. But I'm not quite sure that this is the right approach, it just works well on paper.

Edit3: I am stuck in Mono version 2.0.50727.1433

+5

c # ieee-754 sse simd mono

Shaggi Apr 3 '15 at 10:21

source share

2 answers

Check your mono version because ConvertToInt and ConvertToIntTruncated were added 4 years ago and must be present since release 2.10.

+1

Jester Apr 6 '15 at 16:06

source share

Douglas zare · Accepted Answer · 2015-04-12T05:31:05+0000

You can reduce the problem by adopting float mod 1. To simplify this, you can calculate the float gender using bit operations, and then use floating point subtraction. The following (unsafe) C # code for these operations:

 // domain is assumed to be positive // returns value in [0,domain) public float fmodulus(float val, float domain) { if (val < 0) { float negative = fmodulus(-val, domain); if (domain - negative == domain) return 0; else return domain-negative; } if (val < domain) return val; // this avoids losing accuracy return fmodOne(val / domain) * domain; } // assumes val >= 1, so val is positive and the exponent is at least 0 unsafe public float fmodOne(float val) { int iVal = *(int*)&val; int uncenteredExponent = iVal >> 23; int exponent = uncenteredExponent - 127; // 127 corresponds to 2^0 times the mantissa if (exponent >= 23) return 0; // not enough precision to distinguish val from an integer int unneededBits = 23 - exponent; // between 0 and 23 int iFloorVal = (iVal >> unneededBits) << unneededBits; // equivalent to using a mask to zero the bottom bits of the mantissa float floorVal = *(float*)&iFloorVal; // convert the bit pattern back to a float return val-floorVal; }

For example, fmodulus (100.1f, 1) - 0.09999847. The bit diagram 100.1f is equal to

0 10000101 10010000011001100110011

FloorVal bitmap (100f)

0 10000101 10010000000000000000000

Floating point subtraction gives something close to 0.1f:

0 01111011 10011001100110000000000

Actually, I was surprised that the last 8 bits were reset. I thought that only the last 6 bits of 0.1f should be replaced with 0. Perhaps you can do better than relying on floating point subtraction.

Decrease floating point range

More articles: