Does it dereference a pointer that is equal to the standard nullptr undefined by standard?

The author of the blog raised a discussion about removing a null pointer:

I put a few arguments here:

His main line of reasoning citing the standard is this:

The expression '& podhd-> line6' is the undefined behavior in C when "podhd" is a null pointer.

The C99 standard refers to the "&" address of an operator (6.5.3.2 "Address and Indirection Operators"):

The operand of a unary and operator must be either a function notation, the result of the operator [] or unary * or lvalue, which indicates an object that is not a bit field and is not declared with a class stator specifier.

The expression 'podhd-> line6' is clearly not a function designation, the result of the [] or * operator. This is an lvalue expression. However, when the "podhd" pointer is NULL, the expression does not denote an object with 6.3.2.3 "Pointers" says:

If the constant of the null pointer is converted to a type of pointer, the resulting pointer, called the null pointer, is guaranteed to compare does not match the pointer to any object or function.

When "lvalue does not assign an object when evaluating it, the behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays and functions by designators"):

lvalue is an expression with an object type or an incomplete type other than void; if lvalue does not indicate an object when it is evaluated, the behavior is undefined.

So, one and the same idea in short:

When → was executed on a pointer, it was evaluated on an lvalue, where there is no object, and as a result the behavior is undefined.

This question is purely language, I do not ask about whether this system allows you to interfere with what lies at address 0 in any language.

As far as I see, there are no restrictions when dereferencing a pointer variable whose value is nullptr , even mental comparisons of a pointer to a constant nullptr (or (void *) 0 ) may disappear during optimization in some situations due to the indicated points, but it looks like another problem, this does not interfere with dereferencing a pointer whose value is nullptr . Please note that I checked other SO questions and answers, especially how this set of quotes , as well as the standard quotes above, and I didn’t stumble upon something that clearly conforms to the standard, that if the ptr compared to nullptr , dereferencing it will be undefined.

The most I get is that deferring a constant (or casting it to any type of pointer) is what UB is, but it says nothing about a variable whose bit is equal to the value that comes from nullptr .

I want to clearly distinguish the nullptr constant from a pointer variable that has a value equal to it. But the answer, which applies to both cases, is ideal.

I understand that optimization can accelerate when there are comparisons with nullptr , etc., and it can simply strip the code based on this.

If the conclusion is that if ptr is equal to the dereference nullptr value, then this is definitely UB, the next question:

Do the C and C ++ standards know that a special value in the address space should exist only to represent the value of null pointers?

+5
source share
3 answers

As you quote C, dereferencing a null pointer is clearly an undefined behavior from this standard quote (emphasis mine):

(C11, 6.5.3.2p4) "If an invalid value is assigned to the pointer, the behavior of the unary operator is * undefined .102)"

102): "Among the invalid values ​​for dereferencing a pointer by the unary operator" * "is a null pointer , an address inappropriately aligned with the type of object it points to, and the address of the object after its end."

The exact same quote on C99 and similar in C89 / C90.

+10
source

C ++

dcl.ref / 5.

There should be no links to links, no arrays of links and links to links. the declaration of the reference must contain an initializer (8.5.3), unless the declaration contains an explicit extern specifier (7.1.1), is a declaration of a class member (9.2) in the class definition, or is a declaration of a parameter or return type (8.3.5); see 3.1. The link must be initialized to reference a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a link would be to bind it to an “object” obtained indirectly through a null pointer, which causes undefined behavior. As described in 9.6, a link cannot be bound directly to a bit field. - final note]

The note is of interest because it explicitly speaks of dereferencing the null pointer undefined.

I'm sure he talks about it somewhere else in a more relevant context, but that's good enough.

+3
source

The answer to this, which I see to what extent the NULL value can be dereferenced, is whether it is intentionally left platform-dependent indefinitely, due to the fact that it remains to be implemented in C11 6.3.2.3p5 and p6. This is mainly to support standalone implementations used to develop boot code for the platform, as the OP points out in its rebuttal link, but also has applications for the hosted implementation.

Re:
(C11, 6.5.3.2p4) "If an invalid value is assigned to the pointer, the behavior of the unary * operator is undefined.102)"

102): "Among the invalid values ​​for dereferencing a pointer by a unary operator," * "is a null pointer, an address inadequately aligned with the type of object that it points to, and the address of the object after its end."

This is expressed as it is, afaict, because each of the cases in the footnote CANNOT be invalid for certain platforms that the compiler targets. If there is a defect, this “invalid value” should be italicized and defined “using implementation”. For the alignment case, the platform can have access to any type using any address, so there is no alignment requirement, especially if address translation is supported; and the platform can assume that the lifetime of the object ends only after the application exits, allocating a new frame via malloc () for automatic variables with every function call.

For null pointers during loading, the platform may have expectations that the structures used by the processor have certain physical addresses, including at address 0, and are obtained as object pointers in the source code or may require a function that determines the loading process, use base address 0. If the standard did not allow views such as & podhd-> line6, where the platform requires podhd to have a base address of 0, then assembly language is required to access this structure. Similarly, for the soft reset function, you may need to dereference a 0-digit pointer as a call to the void function. The hosting implementation can consider 0 the base of the executable image and map the NULL pointer in the source code to the header of this image after loading, since the structure must have a logical address 0 for this instance of virtual machine C.

The fact that standard call pointers are more processed in the virtual address space of the virtual machine, where object handlers have more requirements for what operations are allowed for them. How the compiler emits code that takes into account the requirements of these handles for a particular processor remains undefined. What is effective for one processor may not be for another, after all.

The requirement for (void *) 0 is greater than the compiler emitting code that guarantees the expressions in which the source uses (void *) 0, explicitly or by referencing NULL, that the actual stored value will be what it could mean, t points to any valid function definitions or objects using any conversion code. It should not be 0! Similarly, for casts (void *) from 0 to (obj_type) and (func_type), they are only needed to get assigned values, which are evaluated as addresses, the compiler guarantees are not used, and then for objects or code. The difference with the latter is that they are not used, and are not invalid, so they can be dereferenced in a certain way.

Then the code that checks for pointer equality checks if one of the operands is one of these values, the other is one of three, and not just the same bit pattern, because it drags them with RTTI like (null *), great from pointer types void, obj and func for certain objects. The standard may be more explicit, it is a separate type if it is not specified, because compilers use it only internally, but I assume that this is considered to be the explicit "null pointer" path in italics. Effectively, imo, a '0' in these contexts is an additional compiler keyword due to the additional requirement of its type identification (null *), but is not characterized as such because it will complicate the definition of <; identifiers>.

This stored value can be SIZE_MAX as easy as 0 for (void *) 0 in the emitted application code, when implementations, for example, define the range 0 to SIZE_MAX-4 * sizeof (void *) of the virtual machine treats as what really for code and data. The NULL macro can even be defined as: (void *) SIZE_MAX, and for the compiler it would have to find out from the context that it has the same semantics as 0. The coding code is responsible for the fact that this selected value, in pointer-pointers - pointers and the delivery of what is suitable as a pointer to an object or function. integer pointer throws, implicit or explicit, have similar validation and delivery requirements; especially in unions where the field (u) intptr_t overlaps the field (type *). Portable code can protect against compilers without doing it right, with an explicit expression * (ptr == NULL? (Type *) 0: ptr).

+1
source

Source: https://habr.com/ru/post/1213573/


All Articles