Serialize any data type as vector <uint8_t> - use reinterpret_cast?
I did not find anything directly related to the search, so please forgive me if this is a duplicate.
What I want to do is serialize the data over a network connection. My approach is to convert everything I need to pass to std::vector< uint8_t > , and on the receiving side, unpack the data into the corresponding variables. My approach is as follows:
template <typename T> inline void pack (std::vector< uint8_t >& dst, T& data) { uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data)); dst.insert (dst.end (), src, src + sizeof (T)); } template <typename T> inline void unpack (vector <uint8_t >& src, int index, T& data) { copy (&src[index], &src[index + sizeof (T)], &data); } What i use as
vector< uint8_t > buffer; uint32_t foo = 103, bar = 443; pack (buff, foo); pack (buff, bar); // And on the receive side uint32_t a = 0, b = 0; size_t offset = 0; unpack (buffer, offset, a); offset += sizeof (a); unpack (buffer, offset, b); I'm worried about
uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));
(which I understand the same way as reinterpret_cast ). Is there a better way to do this without double casting?
My naive approach was to simply use static_cast< uint8_t* >(&data) , which failed. I said in the past that reinterpret_cast bad. Therefore, I would like to avoid this (or the design that I currently have), if possible.
Of course, there is always uint8_t * src = (uint8_t *)(&data) .
Suggestions?
My suggestion is to ignore all the people telling you that reinterpret_cast bad. They tell you that this is bad, because it is usually not recommended to accept a memory card of one type and pretend that it is a different type. But in this case, this is exactly what you want to do, since your whole goal is to transfer the memory card as a series of bytes.
This is much better than using static_cast , as it fully details the fact that you take one type and intentionally pretend it is something else. This situation exactly matches reinterpret_cast , and dodging it with an intermediary with a void pointer just hides your meaning without any advantages.
Also, I'm sure you know about this, but look at the pointers in T.
Your situation exactly matches reinterpret_cast , it's simpler than double static_cast , and clearly shows what you are doing.
To be safe, you should use unsigned char instead of uint8_t :
- executing
reinterpret_casttounsigned char *and then dereferencing the resulting pointer is safe and portable and explicitly allowed [basic.lval] Β§3.10 / 10 reinterpret_castis executed tostd::uint8_t *, and then dereferencing the resulting pointer is a violation of the strict anti-aliasing rule and is undefined ifstd::uint8_tis implemented as an unsigned extended integer type.If it exists,
uint8_tshould always be the same width asunsigned char. However, it should not be of the same type; it can be a separate extended integer type. It also should not have the same idea asunsigned char(see When is uint8_t β unsigned char? ).(This is not entirely hypothetical: creating
[u]int8_tspecial extended integer type allows for some aggressive optimizations)
If you really want uint8_t , you can add:
static_assert(std::is_same<std::uint8_t, unsigned char>::value, "We require std::uint8_t to be implemented as unsigned char"); so the code will not compile on platforms on which it will lead to undefined behavior.
You can get rid of one throw using the fact that any pointer can be implicitly transferred to void* . Alternatively, you can add a few const :
//Beware, brain-compiled code ahead! template <typename T> inline void encode (std::vector< uint8_t >& dst, const T& data) { const void* pdata = &data; uint8_t* src = static_cast<uint8_t*>(pdata); dst.insert(dst.end(), src, src + sizeof(T)); } You might want to add compile-time checking for T as POD, no struct, and no pointer.
However, the interpretation of some object memory at the byte level will never be preserved, period. If you need to do this, do it in a beautiful wrapper (as you did) and flip it over. When you connect to another platform / compiler, pay attention to these things.
Here you do not do any real encoding, you just copy the raw representation of the data from memory into an array of bytes, and then send them over the network. This will not work. Here is a quick example:
struct A { int a; }; struct B { A* p_a; } What happens if you use your method to send B over the network? The recipient receives p_a , the address of some object A on your computer, but this object is not on your computer. And even if you sent object A too, it will not have the same address. There is no way that can work if you just send raw B struct. And it does not even address the more subtle issues, such as the idea of ββcontent and floating point, which can affect the transmission of simple types such as int and double .
What you are doing now is fundamentally no different from what it is for work or not (it just does not work, except in the most trivial cases).
What you need to do is develop a serialization method. Serialization means any way to solve these kinds of problems: how to get objects in memory on the network in such a way that they can be meaningfully restored on the other hand. This is a complex problem, but it is a well-known and repeatedly resolved problem. Here's a good starting point to read: http://www.parashift.com/c++-faq-lite/serialization.html