What float values ​​could not be converted to int without undefined behavior [C ++]?

I just read this from the C ++ 14 standard (my emphasis):

4.9 Transforms with a floating integral [conv.fpint]

1 The value of a variable of type floating point can be converted to a prvalue of an integer type. The conversion truncates; those. the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [...]

What made me think

  • What values, if any, floatcannot be represented as intafter truncation? (Does it depend on the implementation?)
  • If they are, does this mean that it is auto x = static_cast<int>(float)unsafe?
  • What is the correct / safe way to convert floatto intthen (if you want truncation)?
+6
source share
2 answers

No wonder what floatmatters is out of range int. Floating point values ​​were invented to represent very large (as well as very small) values.

  • INT_MAX + 1(usually equal 2147483648) cannot be represented int, but can be represented float.
  • Yes, static_cast<int>(float)as dangerous as undefined behavior. However, something simple, as x + yfor sufficiently large integers xand y, is also UB, so there is nothing surprising here.
  • , , ++. Boost numeric_cast, ; . ( INT_MIN INT_MAX), ,

    float f;
    int i;
    ...
    if (static_cast<double>(INT_MIN) <= f && f < static_cast<double>(INT_MAX))
        i = static_cast<int>(f);
    else if (f < 0)
        i = INT_MIN;
    else
        i = INT_MAX;
    

    . double, int? , . , , int? , boost::numeric_cast, .

+2

, , . , , iee754 4 floats 8 doubles 2 (int32_t 4 int64_t 8 ).

, ( UB), memcpy .

, , , , UB , , doubleint32_t. , , float min/max , .

, INT_MIN/INT_MAX ( ) , , .

Inf/NaN UB .

// float->int64 edgecases
static const uint32_t FloatbitsMaxFitInt64 = 0x5effffff; // [9223371487098961920] Largest float which still fits int an signed int64
static const uint32_t FloatbitsMinNofitInt64 = 0x5f000000; // [9223372036854775808] the bit pattern of the smallest float which is too big for a signed int64
static const uint32_t FloatbitsMinFitInt64 = 0xdf000000; // [-9223372036854775808] Smallest float which still fits int an signed int64
static const uint32_t FloatbitsMaxNotfitInt64 = 0xdf000001; // [-9223373136366403584] Largest float which to small for a signed int64

// float->int32 edgecases
static const uint32_t FloatbitsMaxFitInt32 = 0x4effffff; // [2147483520] the bit pattern of the largest float which still fits int an signed int32
static const uint32_t FloatbitsMinNofitInt32 = 0x4f000000; // [2147483648] the bit pattern of the smallest float which is too big for a signed int32
static const uint32_t FloatbitsMinFitInt32 = 0xcf000000; // [-2147483648] the bit pattern of the smallest float which still fits int an signed int32
static const uint32_t FloatbitsMaxNotfitInt32 = 0xcf000001; // [-2147483904] the bit pattern of the largest float which to small for a signed int32

// double->int64 edgecases
static const uint64_t DoubleBitsMaxFitInt64 = 0x43dfffffffffffff; // [9223372036854774784] Largest double which fits into an int64
static const uint64_t DoubleBitsMinNofitInt64 = 0x43e0000000000000; // [9223372036854775808] Smallest double which is too big for an int64
static const uint64_t DoubleBitsMinFitInt64 = 0xc3e0000000000000; // [-9223372036854775808] Smallest double which fits into an int64
static const uint64_t DoubleBitsMaxNotfitInt64 = 0xc3e0000000000001; // [-9223372036854777856] largest double which is too small to fit into an int64

// double->int32 edgecases[when truncating(round towards zero)]
static const uint64_t DoubleBitsMaxTruncFitInt32 = 0x41dfffffffffffff; // [~2147483647.9999998] Largest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMinTruncNofitInt32 = 0x41e0000000000000; // [2147483648.0000000] Smallest double that when truncated wont fit into an int32
static const uint64_t DoubleBitsMinTruncFitInt32 = 0xc1e00000001fffff; // [~2147483648.9999995] Smallest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMaxTruncNofitInt32 = 0xc1e0000000200000; // [2147483649.0000000] Largest double that when truncated wont fit into an int32

// double->int32 edgecases [when rounding via bankers method(round to nearest, round to even on half)]
static const uint64_t DoubleBitsMaxRoundFitInt32 = 0x41dfffffffdfffff; // [2147483647.5000000] Largest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMinRoundNofitInt32 = 0x41dfffffffe00000; // [~2147483647.5000002] Smallest double that when rounded wont fit into an int32
static const uint64_t DoubleBitsMinRoundFitInt32 = 0xc1e0000000100000; // [-2147483648.5000000] Smallest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMaxRoundNofitInt32 = 0xc1e0000000100001; // [~2147483648.5000005] Largest double that when rounded wont fit into an int32

, :

if( f >= B2F(FloatbitsMinFitInt32) && f <= B2F(FloatbitsMaxFitInt32))
    // cast is valid.

B2F - :

float B2F(uint32_t bits)
{
    static_assert(sizeof(float) == sizeof(uint32_t), "Weird arch");
    float f;
    memcpy(&f, &bits, sizeof(float));
    return f;
}

, nans/inf ( ), non-iee754 (, ffast-math on gcc /fp: fast on msvc)

+7

All Articles