What's so hard about `uint64_t`? (Conversion Build From `float`)

Question

What's so hard about `uint64_t`? (Conversion Build From `float`)

I am in a situation where I need to calculate something like size_t s=(size_t)floorf(f); . That is, the argument is a float, but it has an integer value (suppose that floorf(f) is small enough to be accurately represented). By optimizing this, I found something interesting.

Below are some conversions from float to integers (GCC 5.2.0-O3). For clarity, the above transformation is the return value of a test function.

Here int32_t x=(int32_t)f :

  cvttss2si eax, xmm0 ret

Here uint32_t x=(uint32_t)f :

  cvttss2si rax, xmm0 ret

Here int64_t x=(int64_t)f :

  cvttss2si rax, xmm0 ret

Finally, here uint64_t x=(uint64_t)f; :

  ucomiss xmm0, DWORD PTR .LC2[rip] jnb .L4 cvttss2si rax, xmm0 ret .L4: subss xmm0, DWORD PTR .LC2[rip] movabs rdx, -9223372036854775808 cvttss2si rax, xmm0 xor rax, rdx ret .LC2: .long 1593835520

This last one is much more complicated than the rest. Moreover, Clang and MSVC behave similarly. For your convenience, I translated it into pseudo-C:

 float lc2 = (float)(/* 2^63 - 1 */); if (f<lc2) { return (uint64_t)f; } else { f -= lc2; uint64_t temp = (uint64_t)f; temp ^= /* 2^63 */; //Toggle highest bit return temp; }

It looks like he is trying to correctly calculate the first overflow code 64. This seems like a fiction, since the documentation for cvttss2si tells me that if an overflow occurs (at 2 ^ 32, not 2 ^ 64), "an undefined integer value (80000000H) is returned."

My questions:

What does it really do, and why?
Why hasn’t something done the same for other integer types?
How to change the transformation to create similar code (only output lines 3 and 4) (again, suppose the value is accurately represented)?

+7

assembly floating-point

imallett 21 sept '15 at 0:02

source share

1 answer

Jester · Accepted Answer · 2015-09-21T00:43:42+0000

Since cvttss2si performs the signed conversion, it will consider numbers in the range [2^63, 2^64) out of range when they are actually in the unsigned range. Therefore, this case is detected and displayed on the low half in the float, and after the conversion, correction is applied.

As in other cases, note that the uint32_t conversion still uses a 64-bit destination, which will work for the entire uint32_t range, and further truncation is implicit, using low 32 bits of the result in accordance with the calling convention.

Regarding the exclusion of additional code, it depends on whether your input can fall into the above range. If possible, there is no way around this. Otherwise, double-clicking may work, which will first be signed and then unsigned, i.e. (uint64_t)(int64_t)f .

What's so hard about `uint64_t`? (Conversion Build From `float`)

More articles: