How to convert unsigned int to float?

Question

How to convert unsigned int to float?

I need to create a function that returns the equivalent of the bit level (float) x without using any floating data types, operations or constants. I think I have this, but when I run the test file, it returns that there is an infinite loop. Any debugging help would be appreciated.

I am allowed to use any integer / unsigned operations, including ||, & &, if, while. In addition, I can only use 30 operations

unsigned float_i2f(int x) { printf("\n%i", x); if (!x) {return x;} int mask1 = (x >> 31); int mask2 = (1 << 31); int sign = x & mask2; int complement = ~x + 1; //int abs = (~mask1 & x) + (mask1 & complement); int abs = x; int i = 0, temp = 0; while (!(temp & mask2)){ temp = (abs <<i); i = i + 1; } int E = 32 - i; int exp = 127 + E; abs = abs & (-1 ^ (1 << E)); int frac; if ((23 - E)>0) frac = (abs << (23 - E)); else frac = (abs >> (E - 23)); int rep = sign + (exp << 23) + frac; return rep; }

In response to very useful comments and answers, here is the updated code, now only a glitch for 0x80000000:

 unsigned float_i2f(int x) { int sign; int absX; int E = -1; int shift; int exp; int frac; // zero is the same in int and float: if (!x) {return x;} // sign is bit 31: that bit should just be transferred to the float: sign = x & 0x80000000; // if number is < 0, take two complement: if (sign != 0) { absX = ~x + 1; } else absX = x; shift = absX; while ((!!shift) && (shift != -1)) { //std::cout << std::bitset<32>(shift) << "\n"; E++; shift = (shift >> 1); } if (E == 30) { E++;} exp = E + 127+24; exp = (exp << 23); frac = (absX << (23 - E)) & 0x007FFFFF; return sign + exp + frac; }

Does anyone know where the error is in the revised code? Thanks again!

0

c type-conversion binary bit unsigned-integer

singmotor Oct 22 '13 at 22:14

source share

3 answers

Floris · Answer 1 · 2013-10-22T23:06:21+0000

There are quite a few opportunities to improve your code and clean it up. To start, add comments! Secondly, (and to reduce the number of operations) you can combine certain things. Thirdly, to distinguish between "integers that can be represented exactly" from "those that cannot."

Here is a sample code to put some of these things into practice; I could not actually compile and test this, so there may be some errors - I'm trying to show the approach, and not do your job for you ...

 unsigned float_i2f(int x) { // convert integer to its bit-equivalent floating point representation // but return it as an unsigned integer // format: // 1 sign bit // 8 exponent bits // 23 mantissa bits (plus the 'most significant bit' which is always 1 printf("\n%i", x); // zero is the same in int and float: if (x == 0) {return x;} // sign is bit 31: that bit should just be transferred to the float: sign = x & 0x8000; // if number is < 0, take two complement: int absX; if(sign != 0) { absX = ~x + 1; } else absX = x; } // Take at most 24 bits: unsigned int bits23 = 0xFF800000; unsigned int bits24 = 0xFF000000; unsigned E = 127-24; // could be off by 1 // shift right if there are bits above bit 24: while(absX & bits24) { E++; // check that you add and don't subtract... absX >>= 1; } // shift left if there are no bits above bit 23: // check that it terminates at the right point. while (!(absX & bits23)) E--; // check direction absX <<= 1; } // now put the numbers we have together in the return value: // check that they are truncated correctly return sign | (E << 23) | (absX & ~bits23);

}

chux · Answer 2 · 2013-10-23T02:07:34+0000

Tried a solution that works for any int size.
Independent of 2 compliments.
Works with INT_MIN.
Learned a lot from @Floris

[Edit] Adjusted for rounding and other improvements.

 #include <stdio.h> int Round(uint32_t Odd, unsigned RoundBit, unsigned StickyBit, uint32_t Result); int Inexact; // Select your signed integer type: works with any one //typedef int8_t integer; //typedef int16_t integer; //typedef int32_t integer; typedef int64_t integer; //typedef intmax_t integer; uint32_t int_to_IEEEfloat(integer x) { uint32_t Result; if (x < 0) { // Note 1 Result = 0x80000000; } else { Result = 0; x = -x; // Use negative absolute value. Note 2 } if (x) { uint32_t Expo = 127 + 24 - 1; static const int32_t m2Power23 = -0x00800000; static const int32_t m2Power24 = -0x01000000; unsigned RoundBit = 0; unsigned StickyBit = 0; while (x <= m2Power24) { // Note 3 StickyBit |= RoundBit; RoundBit = x&1; x /= 2; Expo++; } // Round. Note 4 if (Round(x&1, RoundBit, StickyBit, Result) && (--x <= m2Power24)) { x /= 2; Expo++; } if (RoundBit | StickyBit) { // Note 5 Inexact = 1; // TBD: Set FP inexact flag } int32_t i32 = x; // Note 6 while (i32 > m2Power23) { i32 *= 2; Expo--; } if (Expo >= 0xFF) { Result |= 0x7F800000; // Infinity Note 7 } else { Result |= (Expo << 23) | ((-i32) & 0x007FFFFF); } } return Result; } /* Note 1 If `integer` was a signed-magnitude or 1s compliment, then +0 and -0 exist. Rather than `x<0`, this should be a test if the sign bit is set. The following `if (x)` will not be taken on +0 and -0. This provides the corresponding float +0.0 and -0.0 be returned. Note 2 Overflow will _not_ occur using 2s compliment, 1s compliment or sign magnitude. We are insuring x at this point is < 0. Note 3 Right shifting may shift out a 1. Use RoundBit and StickyBit to keep track of bits shifted out for later rounding determination. Note 4 Round as needed here. Possible to need to shift once more after rounding. Note 5 If either RoundBit or StickyBit set, the floating point inexact flag may be set. Note 6 Since the `Integer` type maybe be less than 32 bits, we need to convert to a 32 bit integer as IEEE float is 32 bits.FILE Note 7 Infinity only expected in Integer was 129 bits or larger. */ int Round(uint32_t Odd, unsigned RoundBit, unsigned StickyBit, uint32_t Result) { // Round to nearest, ties to even return (RoundBit) && (Odd || StickyBit); // Truncate toward 0 // return 0; // Truncate away from 0 // return RoundBit | StickyBit // Truncate toward -Infinity // return (RoundBit | StickyBit) || Result } // For testing float int_to_IEEEfloatf(integer x) { union { float f; uint32_t u; } xx; // Overlay a float with a 32-bit unsigned integer Inexact = 0; printf("%20lld ", (long long) x); xx.u = int_to_IEEEfloat(x); printf("%08lX ", (long) xx.u); printf("%d : ", Inexact); printf("%.8e\n", xx.f); return xx.f; } int main() { int_to_IEEEfloatf(0x0); int_to_IEEEfloatf(0x1); int_to_IEEEfloatf(-0x1); int_to_IEEEfloatf(127); int_to_IEEEfloatf(-128); int_to_IEEEfloatf(12345); int_to_IEEEfloatf(32767); int_to_IEEEfloatf(-32768); int_to_IEEEfloatf(16777215); int_to_IEEEfloatf(16777216); int_to_IEEEfloatf(16777217); int_to_IEEEfloatf(2147483647L); int_to_IEEEfloatf(-2147483648L); int_to_IEEEfloatf( 9223372036854775807LL); int_to_IEEEfloatf(-9223372036854775808LL); return 0; }

Lưu Vĩnh Phúc · Answer 3 · 2013-10-23T02:41:23+0000

Speaking of 30 operations , do you consider loop iterations?

 if (!x) {return x;}

handle positive 0s. Why not mask the sign, and it will work for both zeros

 if (!(x & 0x7FFFFFFF)) {return x;}

In addition, many instructions are not needed, for example

 complement = ~x + 1;

Just x = -x enough, because x is no longer used, absX or padding is just redundant. And one negation instruction is faster than 2 operations, right?

!!shift also slower than shift != 0 . This is only useful when you need to use it as an expression of only 0 and 1, otherwise it will be redundant.

Another problem is that operations with signatures can sometimes be slower than unsigned, so if there is no need, you should not declare the variable as int . For example, shift = (shift >> 1) will perform an arithmetic shift (in most compiler implementations), which may cause an unexpected result.

And for the search for the first set of bits, instructions are available for this, there is no need to change and test. Just find the bit position and change the value once. If you are not allowed to use intrinsics, there are many quick ways to do this on Bit Twiddling Hacks .

How to convert unsigned int to float?

More articles: