Aliasing type / tagged -union without union

For two (or more) struct s: Base and Sub with a common first (unnamed) struct safe to convert / distinguish from Base to Sub and vice versa

 struct Base{ struct{ int id; // ... }; char data[]; // necessary? } struct Sub{ struct{ int id; // same '...' }; // actual data }; 

Are these features guaranteed safe and technically correct? (Also: is the null term char data[] necessary and useful?)

 struct Base * subToBase(struct Sub * s){ return (struct Base*)s; } struct Sub * baseToSub(struct Base * b){ if(b->id != SUB_ID){ return NULL; } return (struct Sub*)b; } 

Edit

I do not plan to embed Base further in Sub , but leave the opportunity to add other subtypes (directly under Base ) later without changing Base . My main problem is whether it is safe to convert pointers to struct between "t21" and any under. References to the standard (C11) would be most appreciated.

Change v2

Changed wording slightly to discourage discussion of OOP / inheritance. What I want is tagged-union, without union , so it can be continued later. I have no plans to do additional nesting. Subtypes that need other subtype functions can do this explicitly, without any additional investment.


Context

For the script interpreter 1, I created a pseudo object oriented > tagged-union , without union . It has an (abstract) base type Object with several (specific) subtypes such as String , Number , List , etc. Each struct type has the following unnamed struct as the first member:

 #define OBJHEAD struct{ \ int id; \ int line; \ int column; \ } 

id identifies the type of object, line and column should (also) be understood. Simplified implementation of various objects:

 typedef struct Object{ OBJHEAD; char data[]; // necessary? } Object; typedef struct Number{ OBJHEAD; int value; // only int for simplicity } Number; typedef struct String{ OBJHEAD; size_t length; char * string; } String; typedef struct List{ OBJHEAD; size_t size; Object * elements; // may be any kind and mix of objects } List; Object * Number_toObject(Number * num){ return (Object*)num; } Number * Number_fromObject(Object * obj){ if(obj->type != TYPE_NUMBER){ return NULL; } return (Number*)obj; } 

I know that the most elegant and technically correct way to do this would be to use enum for id and union for different subtypes. But I want the type system to be extensible (through some form of type registry) so that types can be added later without changing all Object related code.

Late / external add may be:

 typedef struct File{ OBJHEAD; FILE * fp; } File; 

without changing the Object .

Are these conversions guaranteed to be safe?

(As for the small macro post: OBJHEAD will of course be widely documented, so additional developers will know which member names to not use. The idea is not to hide the title, but to save the insert time.)

+6
source share
3 answers

It is allowed to convert a pointer to one type of object to a pointer to another type of object (for example, using a throw), but if the resulting pointer is incorrectly aligned, the behavior is undefined (C11 6.3.2.3/7). Depending on the members of Base and Sub and on implantation-dependent behavior, it is not necessary that a Base * converted to Sub * be correctly aligned. For example, given ...

 struct Base{ struct{ int id; }; char data[]; // necessary? } struct Sub{ struct{ int id; }; long long int value; }; 

... it is possible that the implementation allows alignment of Base objects at 32-bit boundaries, but requires that Sub objects be aligned at 64-bit boundaries or even more stringent.

None of this depends on whether Base a flexible array element.

Another question is whether it is safe to dereference the value of a pointer of one type, which was obtained by casting the value of a pointer of another type. First, C imposes several restrictions on how implementations choose structures: members must be laid out in the order in which they are declared, and there must be no additions to the first, but otherwise the implementations have a free board. As far as I know, in your case there is no requirement that the anonymous struct members of your two structures be built in the same way as the others, if they have more than one member. (And if they have only one member, then why use an anonymous structure?) It is also unsafe to assume that Base.data starts at the same offset as the first element following the anonymous structure in Sub .

In practice, dereferencing the result of your subToBase() is probably good, and you can, of course, implement tests to verify this. In addition, if you have Base * , which was obtained by converting from Sub * , then the result of its conversion, for example through baseToSub() , is guaranteed to be the same as the original Sub * (C11 6.3.2.3/7). In this case, converting to Base * and vice versa does not affect the security of dereferencing a pointer as Sub * .

On the other hand, although it is difficult for me to find a link to it in the standard, I have to say that baseToSub() very dangerous in the general context. If a Base * , which does not actually point to Sub , is converted to Sub * (which is itself permitted), then it is not safe to look up this pointer to access members that are not shared by Base . In particular, given my declarations above, if the referenced object is actually Base , then the declared Base.data no way prevents ((Sub *)really_a_Base_ptr)->value from undefined behavior.

To avoid all undefined and implementation-specific behavior, you need an approach that avoids casting and ensures consistent layout. The @LoPiTaL suggestion for embedding a typed Base structure inside Sub structures is a good approach in this regard.

+4
source

No, this is not safe, at least not in any circumstances. If your compiler sees two pointers p and q that have a different base type, it can always assume that they are not aliases or, in other words, can always assume that *p and *q are different objects.

Your throw makes a hole in this assumption. That is, if you have a function

 double foo(struct A* p, struct B* q) { double b = q->field0; *p = (struct A*){ 0 }; return b + q->field0; // compiler may return 2*b } 

the optimizer is allowed to avoid additional memory reads.

If you know that no function will ever see the same object with different types of pointers, you will be safe. But such a statement is not made easily, so you better avoid such hackers.

+2
source

This is correct, since it is guaranteed to have the same alignment for the first member of the structure, so you can drop from one structure to another.

However, the general way to implement your behavior is to "inherit" the base class:

 //Base struct definition typedef struct Base_{ int id; // ... //char data[]; //This is not needed. }Base; //Subclass definition typedef struct Sub_{ Base base; //Note: this is NOT a pointer // actual data }Sub; 

So, now you can use the Sub struct in the Base structure or just return the first element that already has the Base type, so casting is no longer necessary.

One word of caution: do not abuse MACROS. MACROSs are good and good for many things, but misuse of them can make it difficult to read and maintain code. In this case, the macro is easily replaced by the base element.

One final word, your macro is error prone since member names are now hidden. In the end, you can add new members with the same name and get strange errors without knowing why.

With further expansion of the hierarchy into subclasses, you will have to write ALL the base classes of MACRO, and if you use aproach inheritance, you will only have to write a direct base.

None of these solutions will solve your problem: inheritance. The only real solution you would have (preferred) is to change the Tully language. Because of its resemblance to C, C ++ is a better match, but it can do any other language.

+1
source

All Articles