Conceptual problem in the Union

Question

Conceptual problem in the Union

My code is

// using_a_union.cpp #include <stdio.h> union NumericType { int iValue; long lValue; double dValue; }; int main() { union NumericType Values = { 10 }; // iValue = 10 printf("%d\n", Values.iValue); Values.dValue = 3.1416; printf("%d\n", Values.iValue); // garbage value }

Why do I get the garbage value when I try to print Values.iValue after doing Values.dValue = 3.1416 ? I thought the memory layout would look like this . What happens to Values.iValue and Values.lValue; when do I assign something to Values.dValue ?

+7

c ++ c unions

Joel Nov 17 '10 at 4:33

source share

4 answers

Because floating point numbers are represented differently than integers.

All these variables occupy the same memory area (in this case, double occupation is more obvious). If you try to read the first four bytes of this double as an int, you cannot return what you think. Here you are dealing with a raw memory layout, and you need to know how these types are represented.

EDIT: I had to add (as James already pointed out) that writing to one variable in the union and then reading from another causes an undefined call and should be avoided (unless you are recounting the data as a char array).

+7

Ed S. Nov 17 '10 at 4:36

source share

Ok, first look at a simpler example. Ed's answer describes the floating part, but what about how we first look at how names and characters are stored?

Here is an example that I just encoded:

 #include "stdafx.h" #include <iostream> using namespace std; union Color { int value; struct { unsigned char R, G, B, A; }; }; int _tmain(int argc, _TCHAR* argv[]) { Color c; c.value = 0xFFCC0000; cout << (int)cR << ", " << (int)cG << ", " << (int)cB << ", " << (int)cA << endl; getchar(); return 0; }

What do you expect from the withdrawal?

255, 204, 0, 0

Right?

If int is 32 bits and each character is 8 bits, then R should correspond to the left-most byte, G - the second, etc.

But this is wrong. At least on my computer / compiler, it seems ints are stored in reverse byte order. I get,

0, 0, 204, 255

So, for this to produce the result that we would expect (or the result that I would expect anyway), we must change the structure to A,B,G,R This is due to endianness .

In any case, I'm not an expert in this, just stumbled upon an attempt to decode some binary files. The fact is that floats are not necessarily encoded as you would expect ... you need to understand how they are stored internally in order to understand why you get this result.

+2

mpen Nov 17 '10 at 5:02

source share

You did it:

 union NumericType Values = { 10 }; // iValue = 10 printf("%d\n", Values.iValue); Values.dValue = 3.1416;

As the compiler uses memory for this union, it is similar to using the variable with the largest size and alignment (any of them, if there are several), and reinterpret the cast when one of the other types in the union is written or available, as in:

 double dValue; // creates a variable with alignment & space // as per "union Numerictype Values" *reinterpret_cast<int*>(&dValue) = 10; // separate step equiv. to = { 10 } printf("%d\n", *reinterpret_cast<int*>(dValue)); // print as int dValue = 3.1416; // assign as double printf("%d\n", *reinterpret_cast<int*>(dValue)); // now print as int

The problem is that when you set dValue to 3.1416, you completely overwrite the bits you used to store number 10. The new value may seem like garbage, but it's just the result of interpreting the first (sizeof int) bytes of double 3.1416, assuming there will be useful value there int

If you want two things to be independent - so setting double does not affect a previously saved int, then you should use struct / class .

This may help you consider this program:

 #include <iostream> void print_bits(std::ostream& os, const void* pv, size_t n) { for (int i = 0; i < n; ++i) { uint8_t byte = static_cast<const uint8_t*>(pv)[i]; for (int j = 0; j < 8; ++j) os << ((byte & (128 >> j)) ? '1' : '0'); os << ' '; } } union X { int i; double d; }; int main() { X x = { 10 }; print_bits(std::cout, &x, sizeof x); std::cout << '\n'; xd = 3.1416; print_bits(std::cout, &x, sizeof x); std::cout << '\n'; }

Which, for me, produced this conclusion:

 00001010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 10100111 11101000 01001000 00101110 11111111 00100001 00001001 01000000

The first half of each line shows 32 bits that are used for iValue: note that the binary code 1010 in the low byte value (on the left of an Intel processor such as mine) is 10 decimal. Writing 3.1416 changes all 64 bits to a pattern representing 3.1416 (see http://en.wikipedia.org/wiki/Double_precision_floating-point_format ). The old model 1010 is being rewritten, lost, electromagnetic memory no more.

0

Tony delroy Nov 17 '10 at 4:58

source share

James McNellis · Accepted Answer · 2010-11-17T04:37:22+0000

In union all data elements overlap. You can use only one member of union data at a time.

iValue , lValue and dValue all occupy the same space.

Once you write to dValue , the members of iValue and lValue no longer usable: only dValue can be used.

Edit: To respond to the comments below: you cannot write to one member of union data and then read from another member of the data. For this, undefined behavior occurs. (There is one important exception: you can reinterpret any object in either C or C ++ as a char array. There are other minor exceptions, for example, the ability to reinterpret a signed integer as an unsigned integer.) You can find more in both Standard C (C99 6.5 / 6-7) and C ++ Standard (C ++ 03 3.10, if I remember correctly).

Could this “work” in practice for a while? Yes. But if your compiler does not explicitly state that such reinterpretation is guaranteed to work correctly and determine the behavior that it guarantees, you cannot rely on it.

Conceptual problem in the Union

More articles: