How to reinterpret a sequence of bytes as a POD structure without invoking UB?

Suppose we get some data as a sequence of bytes and want to rethink this sequence as a structure (having some guarantees that the data is really in the correct format). For example:

#include <fstream> #include <vector> #include <cstdint> #include <cstdlib> #include <iostream> struct Data { std::int32_t someDword[629835]; std::uint16_t someWord[9845]; std::int8_t someSignedByte; }; Data* magic_reinterpret(void* raw) { return reinterpret_cast<Data*>(raw); // BAD! Breaks strict aliasing rules! } std::vector<char> getDataBytes() { std::ifstream file("file.bin",std::ios_base::binary); if(!file) std::abort(); std::vector<char> rawData(sizeof(Data)); file.read(rawData.data(),sizeof(Data)); if(!file) std::abort(); return rawData; } int main() { auto rawData=getDataBytes(); Data* data=magic_reinterpret(rawData.data()); std::cout << "someWord[346]=" << data->someWord[346] << "\n"; data->someDword[390875]=23235; std::cout << "someDword=" << data->someDword << "\n"; } 

Now magic_reinterpret is really bad here because it violates strict alias rules and therefore calls UB.

How to do this instead of not calling UB, and not making any copies of the data, for example, using memcpy ?


EDIT : The above getDataBytes() function was actually considered as some kind of immutable function. The real example is ptrace(2) , which on Linux, when request==PTRACE_GETREGSET and addr==NT_PRSTATUS , writes (on x86-64) one of two possible structures of different sizes, depending on the width of the trace, and returns the size. Here ptrace calling code cannot predict what type of structure it will receive until it actually calls the call. How can you then safely rethink the results that it gets as the correct type of pointer?

+7
c ++ undefined-behavior struct strict-aliasing
source share
2 answers

Not reading the file as a stream of bytes, but as a stream of Data structures.

Just do for example

 Data data; file.read(reinterpret_cast<char*>(&data), sizeof(data)); 
+4
source share

I think this is a special exception for strict anti-aliasing rules for all char types (signed, unsigned, and simple). Therefore, I think that all you need to do is change the signature of magic_reinterpret to:

 Data* magic_reinterpret(char *raw) 

affairs>

Does not work. I'm afraid. As deviantfan commented, you can read (or write) Data as a series of [unsigned] char , but you cannot read or write char as Data . Joachim's answer is correct.

Having said all this. If you are reading from a network or file, the additional overhead of reading your input as a series of octets and calculating the fields from the buffer will be negligible (and will allow you to cope with layout changes between the versions of the compiler and the machine).

+1
source share

All Articles