Unions and Tibetan

Question

Unions and Tibetan

I searched for a while, but cannot find a clear answer.

Many say that using unions for a pun is undefined and bad practice. Why is this? I see no reason why he could do anything undefined, given that the memory that you write with the source information will not just change at will (if it does not go beyond the stack, but this is not a problem with the union it will be a bad design).

People quote the strictest rule of aliases, but it seems to me that I cannot do this because you cannot do this.

And what's the point of an alliance if you don't get puns? I saw somewhere that they should be used to use the same memory location for different information at different times, but why not just delete the information before reusing it?

Summarizing:

Why is it bad to use alliances for punning type?
What is this point if it is not?

Additional information: I mainly use C ++, but I would like to know about this also C. In particular, I use joins to convert between floats and hexadecimal sixth to send via the CAN bus.

+55

c ++ c type-punning unions

Matthew Wilkins Sep 04 '14 at 11:56 on

source share

5 answers

The purpose of Unions was to preserve space when you want to be able to represent different types of what we call a variant type, see Boost.Variant for a good example.

Another common use is type punning , the validity of this is discussed, but almost the entire compiler supports it, we can see that gcc documents its support :

The practice of reading from a union member other than the one recently written (called "type ping") is commonplace. Even with -fstrict-aliasing, you can use the punning type, provided that the memory is accessed through the join type. So the above code works as expected.

Note that he says that even with -fstrict-aliasing a poi-font is allowed, which indicates that there is a problem with the alias when playing.

Pascal Quoc claimed that defect report 283 clarified that this was permitted in C. Defect report 283 , the following footnote was added as an explanation:

If the element used to access the contents of the union object does not match the element that was last used to store the value in the object, the corresponding part of the representation of the value object is interpreted as representing the object in a new type, as described in 6.2.6 (a process sometimes called "type ping"). This may be a trap view.

in C11, which will be note 95 .

Although the argument is made in the std-discussion topic for the Type Punning postal group via Union , this is unproven, which seems reasonable, since DR 283 did not add a new normative wording, just a footnote:

This, in my opinion, is an indefinite semantic swamp in C. Consensus was not reached between the developers and C as to which particular cases determine the behavior and not [...]

In C ++, it is unclear whether behavior is defined or not .

This discussion also addresses at least one reason why an undesirable use of a pune through a union is undesirable:

[...] the standard C rules violate an alias such as analysis optimization that the current implementations perform.

he interrupts some optimizations. The second argument against this is that using memcpy should generate identical code and does not violate the optimization and correct behavior, for example:

 std::int64_t n; std::memcpy(&n, &d, sizeof d);

instead of this:

 union u1 { std::int64_t n; double d ; } ; u1 u ; ud = d ;

and we can see with godbolt, this creates identical code , and the argument is done, if your compiler does not generate identical code, it should be considered an error:

If this is true for your implementation, I suggest you point out an error. Breaking real optimizations (anything based on parsing aliases based on types) to get around performance issues with any particular compiler seems like a bad idea to me.

The Type Punning, Strict Aliasing, and Optimization blog post also comes to a similar conclusion.

Discussion of the mailing list of behavior undefined: The punning type, to avoid copying, covers a lot of the same land, and we can see how gray the territory can be.

+10

Shafik Yaghmour Jun 26 '15 at 19:37

source share

This is legal in C99:

From standard: 6.5.2.3 Elements of structure and association

If the element used to access the contents of the union object is not the same as the last element used to store the value in the object, the corresponding part of the object representation of the value is equal to reinterpreted as representing the object in a new type, as described in 6.2.6 ( process, sometimes called "type ping"). It could be traps.

+5

Keine Lust 04 Sep '14 at 12:01

source share

SHORT ANSWER: Type punning can be safe in several circumstances. On the other hand, although this seems to be a very well-known practice, it seems that the standard is not very interested in making it official.

I will only talk about C (not C ++).

1. SURFACE TYPE AND STANDARDS

As people have already pointed out, the punct type is allowed in the C99 standard, as well as in C11, in subsection 6.5.2.3 . However, I will rewrite the facts with my own perception of the problem:

In section 6.5 of standard documents C99 and C11, the topic of expressions is developed.
Subsection 6.5.2 relates to postfix expressions .
Support 6.5.2.3 talks about structures and associations .
Section 6.5.2.3 (3) explains the dot operator applied to a struct or union object and what value will be obtained.
Footnote 95 appears there. This footnote says:

If the element used to access the contents of the union object does not match the element that was last used to store the value in the object, the corresponding part of the representation of the value object is interpreted as representing the object in a new type, as described in 6.2.6 (a process sometimes called "type ping"). This may be a trap view.

The fact that the punning type is barely displayed, and as a footnote, provides the key to understanding that this is not an actual problem in C programming.
In fact, the main purpose of using unions is to preserve space (in memory). Since several members use the same address, if it is known that each member will be used by different parts of the program, never at the same time, then instead of struct instead of struct you can use union to save memory.

Subsection 6.2.6 is designated.
Subsection 6.2.6 describes how objects are represented (for example, in memory).

2. REPRESENTATION OF TYPES AND ITS MALFUNCTIONS

If you pay attention to various aspects of the standard, you can be sure of almost nothing:

The presentation of pointers is not clearly indicated.
Worse, pointers of different types may have a different representation (like objects in memory). The members
union use the same header address in memory and the same address as the union object. The members
struct have an increasing relative address, starting from exactly the same memory address as the struct object itself. However, padding bytes can be added at the end of each member. How many? This is unpredictable. Fill bytes are mainly used for memory allocation purposes.
Arithmetic types (integers, real and complex floating point numbers) can be represented in several ways. It depends on the implementation.
In particular, integer types may have padding bits . I suppose this is not true for desktop computers. However, the standard left the door open for this opportunity. Fill bits are used for special purposes (parity, signals, who knows), and not for storing mathematical values.
signed types can have 3 types of representation: 1 complement, 2 additions, only signed bit.
Char types occupy only 1 byte, but 1 byte can have several bits other than 8 (but not less than 8).
However, we can be sure of some details:
but. Char types do not have padding bits.
b. Integer unsigned types are represented exactly as in binary form.
from. unsigned char takes exactly 1 byte, without filling bits, and there is no representation of the trap, because all bits are used. Moreover, it represents a value without any ambiguity, following the binary format for integers.

3. PRESENTATION OF TYPE PUNNING vs TYPE

All of these observations show that if we try to punct type with union members having types other than unsigned char , we could have a lot of ambiguity. This is not portable code, and in particular, we could have the unpredictable behavior of our program.
However, the standard allows such access .

Even if we are sure that each type is represented in our implementation, we could have a sequence of bits that means nothing at all in other types (representation of trap ). In this case, we can do nothing.

4. SAFE CASE: unsigned char

The only safe way to use punning is with unsigned char or well unsigned char arrays (because we know that the members of the array objects are strictly adjacent and there are no padding bytes when their size is calculated using sizeof() ).

  union { TYPE data; unsigned char type_punning[sizeof(TYPE)]; } xx;

Since we know that an unsigned char is represented in strict binary form, without padding bits, you can use the punning type here to look at the binary representation of the data member.
This tool can be used to analyze how values of a given type are represented in a particular implementation.

I can’t see another safe and useful punning application in accordance with standard specifications.

5. COMMENT ON CARDS ...

If you want to play with types, it's best to define your own conversion functions or just use translations . We can recall this simple example:

  union { unsigned char x; double t; } uu; bool result; uu.x = 7; (uu.t == 7.0)? result = true: result = false; // You can bet that result == false uu.t = (double)(uu.x); (uu.t == 7.0)? result = true: result = false; // result == true

+3

pablo1977 Sep 04 '14 at 2:37

source share

There are (or at least were on the C90) two modifications for making this behavior undefined. The first was that the compiler would be allowed to generate additional code that kept track of what was in the union and generated a signal when you accessed the wrong member. In practice, I don’t think anyone ever (maybe CenterLine?). Others were optimization opportunities that open up and they are being used. I used compilers that will delay the record until the last possible moment, which may not be necessary (since the variable goes out of scope, or there is a subsequent record of a different value). Logically, one would expect that this optimization would be turned off when the union was visible, but it was not the earliest version of Microsoft C.

Problems like punning are complex. Committee C (back in the late 1980s) more or less took the position that you should use casts (in C ++, reinterpret_cast) for this, and not even though both methods were widespread at that time. Since then, some compilers (g ++, for example) have taken the opposite view, supporting the use of unions, but not the use of castings. And in practice, it does not work if it is not immediately obvious that there is type-punning. This may be the motivation of the g ++ point of view. If you are a union member, it immediately becomes apparent that there may be tip-puns. But of course, something like:

 int f(const int* pi, double* pd) { int results = *pi; *pd = 3.14159; return results; }

called with:

 union U { int i; double d; }; U u; ui = 1; std::cout << f( &u.i, &u.d );

is completely legal in accordance with strict rules, standard, but with a g ++ error (and, possibly, many other Compilers); when compiling f compiler assumes that pi and pd cannot have an alias and reorders the entry in *pd , but read with *pi . (I believe that this has never been guaranteed. But the current wording of the standard guarantees this.)

EDIT:

As other answers claim the behavior is actually (largely based on quoting a non-normative note taken out of context):

The correct answer is here: pablo1977: the standard makes no attempt to determine the behavior when using the punning function. The probable reason for this is that there is not one that he could identify. This does not prevent the implementation from defining it; although I don’t remember any specific discussions on this subject, I’m sure the Goal was that implementations define something (and most, if not all, do).

Regarding the use of concatenation for type-punning: when C was developing the C90 (in the late 1980s), it was a clear intention to allow a debugging implementation that has additional validation (for example, using fat pointers for border validation). From the discussions at the time, it was clear that the debugging implementation could cache information about the last value initialized in the union, and a trap if you try to access anything else. This is clearly stated in §6.7.2.1 / 16: "The value of not more than one member can be stored in the combined object at any time." Access to the value that there is no undefined behavior; it can assimilate access to an uninitialized variable. (There was some discussion at the time as to whether a member of the same type was legal or not. I do not know that however a final decision was made; after 1990 I switched to C ++.)

Regarding the quotation from C89 saying that behavior is a definition: its definition in section 3 (terms, definitions and symbols) seems very strange. I will have to watch this in my copy of the C90 at home; the fact that it was filmed in later versions of the standards, the presence of the committee was considered a mistake.

The use of unions supported by the standard is a means of imitating derivation. You can define:

 struct NodeBase { enum NodeType type; }; struct InnerNode { enum NodeType type; NodeBase* left; NodeBase* right; }; struct ConstantNode { enum NodeType type; double value; }; // ... union Node { struct NodeBase base; struct InnerNode inner; struct ConstantNode constant; // ... };

and legally access to base.type, although Node was initialized via inner . (The fact that §6.5.2.3 / 6 begins with “One special guarantee is made ...” and further explicitly authorize this very strong indication that all other cases must be undefined. And, of course, there is a statement about that Undefined behavior is indicated differently in this International Standard by the words “undefined behavior or by omitting any explicit definition of behavior” in § 4/2, to state that the behavior is not undefined, you must show where it is defined in the standard.)

Finally, in relation to the type-penalty: all (or at least all of this I used) implementations somehow support it. my impression at that time was that the goal was that pointer casting is a way to support implementation; in C ++ standard, there is even (non-normative) text to suggest that the results of a reinterpret_cast will be "unsurprising" for someone familiar with the underlying architecture. In practice, however, most implementations support the use of a union for type-punning, provided that access is through a union member. Most implementations (but not g ++) also support pointer casting, provided that casting a pointer is clearly visible to the compiler (for some undefined pointer definition). And the “standardization” of basic equipment means things like:

 int getExponent( double d ) { return ((*(uint64_t*)(&d) >> 52) & 0x7FF) + 1023; }

actually quite portable. (It will not work on mainframes, of course.) What doesn't work, things like my first example, where aliases are invisible to the compiler. (I am pretty sure that this is a defect in the standard. It seems that I remember even seeing DR regarding this.)

+2

James Kanze Sep 04 '14 at 13:15

source share

Christoph · Accepted Answer · 2014-09-04 18:40

For repeated iteration, type-punting through joins works fine in C (but not in C ++). On the contrary, the use of guidance aimed at this violates the strict anti-aliasing of C99 and is problematic, because different types may have different alignment requirements, and you can raise SIGBUS if you do it wrong. With unions, this is never a problem.

Relevant quotes from C standards:

C89 section 3.3.2.3 §5:

if access to a member of the union object after the value has been stored in another member of the object, the behavior is determined by the implementation

Section C11 6.5.2.3 §3:

Postfix expression followed by. The operator and identifier denotes a member of the structure or object of the union. The value is a named element.

with the following footnote 95:

If the element used to read the contents of the union object does not match the element used by the latter to store the value in the object, the corresponding part of the representation of the value object is reinterpreted as representing the object in a new type, as described in 6.2.6 (a process sometimes called "type ping" "). This may be a trap view.

That should be very clear.

James is confused because section C11 6.7.2.1 §16 reads

The value of not more than one of the members can be stored in the combined object at any time.

This seems inconsistent, but it is not: unlike C ++, C has no concept of an active member, and it is great for accessing a single stored value through an expression of an incompatible type.

Unions and Tibetan

More articles: