How can I read a signed integer from the uint8_t buffer without causing behavior not defined in the implementation?

Here's the simplest function that tries to read an integer integer from a large end buffer, where we will assume std::is_signed_v<INT_T> :

 template<typename INT_T> INT_T read_big_endian(uint8_t const *data) { INT_T result = 0; for (size_t i = 0; i < sizeof(INT_T); i++) { result <<= 8; result |= *data; data++; } return result; } 

Unfortunately, this behavior is undefined, since the last <<= is shifted to the sign bit.


So now we will try the following:

 template<typename INT_T> INT_T read_big_endian(uint8_t const *data) { std::make_unsigned_t<INT_T> result = 0; for (size_t i = 0; i < sizeof(INT_T); i++) { result <<= 8; result |= *data; data++; } return static_cast<INT_T>(result); } 

But now we call the implementation-defined behavior in static_cast , a conversion from unsigned to signed.


How can I do this while in a “clearly defined” realm?

+7
c ++ undefined-behavior
source share
1 answer

Start by assembling bytes into an unsigned value. If you do not need to assemble groups of 9 or more octets, the corresponding C99 implementation is guaranteed to be of a type that is large enough to hold them all (for the C89 implementation, it will be guaranteed to have an unsigned type large enough to hold at least four).

In most cases, when you want to convert a sequence of octets into a number, you will know how many octets you expect. If the data is encoded as 4 bytes, you must use four bytes regardless of the size of int and long (the portable function must return a long type).

 unsigned long octets_to_unsigned32_little_endian(unsigned char *p) { return p[0] | ((unsigned)p[1]<<8) | ((unsigned long)p[2]<<16) | ((unsigned long)p[3]<<24); } long octets_to_signed32_little_endian(unsigned char *p) { unsigned long as_unsigned = octets_to_unsigned32_little_endian(p); if (as_unsigned < 0x80000000) return as_unsigned; else return (long)(as_unsigned^0x80000000UL)-0x40000000L-0x40000000L; } 

Note that subtraction is performed as two parts, each of which is in the long record range, to allow systems where LNG_MIN is -2147483647. Attempting to convert a sequence of bytes {0,0,0,0xx80} in such a system may result in an Undefined Behavior [since it will calculate the value -2147483648], but the code must fully transfer all values ​​that would be within the range of "long".

+3
source share

All Articles