Why the unspecified behavior of `i = ++ i + 1`?

Question

Why the unspecified behavior of `i = ++ i + 1`?

Consider the following C ++ standard ISO / IEC 14882: 2003 (E) quotation (clause 5, clause 4):

Except where noted, the order of evaluation of the operands of individual operators and the subexpressions of individual expressions and the order in which side effects occur are uncertain. 53) Between the previous and the next point in the sequence - the scalar object must have a stored value no more than once the evaluation of the expression. In addition, the previous value should only be accessed to determine the value to be stored. The requirements of this clause must be followed for each valid order of subexpression of the full expression; otherwise, the behavior is undefined. [Example:
i = v[i++]; // the behavior is unspecified i = 7, i++, i++; // i becomes 9 i = ++i + 1; // the behavior is unspecified i = i + 1; // the value of i is incremented 
-end example]

I was surprised that i = ++i + 1 gives the value undefined i . Does anyone know of a compiler implementation that does not give 2 for the following case?

 int i = 0; i = ++i + 1; std::cout << i << std::endl;

The fact is that operator= has two arguments. The first is always the i link. In this case, the order of evaluation does not matter. I do not see any problems except C ++ Standard taboo.

Please do not consider cases where the order of the arguments is important for evaluation. For example, ++i + i is obviously undefined. Please think only of my case i = ++i + 1 .

Why does the C ++ standard prohibit such expressions?

+44

c ++ variable-assignment standards

Alexey Malistov Dec 07 '09 at 14:56

source share

15 answers

This behavior is undefined, and not (simply) unspecified behavior, because there are two entries in i without an intermediate point in the sequence. This is by definition, as far as defined by the standard.

The standard allows compilers to generate code that delays writing back to the repository - or from a different perspective, for repeating instructions that implement side effects - in any way that it chooses if it meets the requirements of the sequence points.

The problem with this expression is that it implies two entries in i without an intermediate point in the sequence:

 i = i++ + 1;

One entry is for the value of the initial value i plus one, and the other for this value is plus one again. These records can occur in any order or completely explode, as far as the standard allows. Theoretically, it even gives implementations the freedom to write backwards in parallel, without interfering with checking concurrent access errors.

+37

Charles Bailey Dec 07 '09 at 15:09

source share

C / C ++ defines a concept called sequence points , which refer to the execution point, where it guaranteed that all the effects of previous evaluations will have been fulfilled. The expression i = ++i + 1 is undefined because it increases i and also assigns i to itself, none of which is a specific point in the sequence. Therefore, it is not determined what will happen first.

+15

Charles Salvia Dec 07 '09 at 15:04

source share

Update for C ++ 11 (09/30/2011)

Stop , this is clearly defined in C ++ 11. It was undefined only in C ++ 03, but C ++ 11 is more flexible.

 int i = 0; i = ++i + 1;

After this line i will be 2. The reason for this change was ... because it already works in practice, and there would be more work to make it undefined than just leaving it defined in the C ++ 11 rules (actually, it works currently, this is more an accident than a deliberate change, so don't do it in your code!).

Straight from the mouth of the horse

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#637

+10

Johannes Schaub - litb Sep 30 2018-11-21T00:

source share

Given two options: definite or undefined, what choice would you make?

The authors of the standard had two options: define the behavior or specify it as undefined.

Given the obviously unreasonable nature of writing such code, in the first place, it makes no sense to indicate the result for it. One could refuse such a code and not encourage it. This is not useful and not necessary for anything.

In addition, standards committees have no way to get compiler authors to do anything. If they required certain behavior, it is likely that this requirement would be ignored.

There are practical reasons, but I suspect that they were subordinate to the aforementioned general opinion. But for writing, any required behavior for these kinds of expressions and related types limits the compiler's ability to generate code, define common subexpressions, move objects between registers and memory, etc. C was already limited by weak visibility restrictions. Languages like Fortran have long understood that alias parameters and globals were an optimizer-killer, and I believe that they simply forbade them.

I know that you are interested in a specific expression, but the exact nature of any given construction does not really matter. It is not easy to predict what a complex code generator will do, and the language tries not to require these predictions in silly cases.

+9

DigitalRoss Dec 07 '09 at 15:09

source share

An important part of the standard:

its stored value, changed no more than once by evaluating the expression

You change the value twice, once with the ++ operator, once with the assignment

+8

Trent Dec 07 '09 at 15:08

source share

Please note that your copy of the standard is outdated and contains a known (and fixed) error only in the 1st and 3rd lines of your example code, see:

The problem with the standard C ++ kernel. Content, version 67, No. 351

and

Andrew Koenig: Sequence Point Error: Undefined or Undefined?

The topic is not just getting just reading the standard (which is rather unclear :( in this case).

For example, whether it is good (or not) -determined, indefinite, or in general in general, actually depends not only on the structure of the operator, but also on the contents of the memory (namely, the values of the variables) at the time of execution, another example:

 ++i, ++i; //ok (++i, ++j) + (++i, ++j); //ub, see the first reference below (12.1 - 12.3)

Please take a look (all this is clear and precise):

JTC1 / SC22 / WG14 N926 Point Sequence Analysis

In addition, Angelica Langer has an article on this topic (although not as clear as the previous one):

"Sequence points and expression evaluation in C ++"

There was also a discussion in Russian (although with some obviously erroneous statements in the comments and in the article itself):

"Follow points (sequence points)"

+7

mlvljr Dec 07 '09 at 16:10

source share

Assuming you are asking, "Why is the language designed this way?"

You say that i = ++i + i "obviously undefined", but i = ++i + 1 should leave i with a specific value? Honestly, that would be consistent. I prefer that either everything is completely defined, or everything is invariably indefinite. In C ++, I have the latter. This is not a terribly bad choice in itself - firstly, it prevents you from writing evil code that makes five or six modifications in the same “statement”.

+4

Daniel Daranas Dec 07 '09 at 15:15

source share

The following code demonstrates how you can get the wrong (unexpected) result:

 int main() { int i = 0; __asm { // here standard conformant implementation of i = ++i + 1 mov eax, i; inc eax; mov ecx, 1; add ecx, eax; mov i, ecx; mov i, eax; // delayed write }; cout << i << endl; }

As a result, it will print 1.

+4

Kirill V. Lyadvinsky Dec 08 '09 at 7:30 a.m. a.m.

source share

An argument by analogy: If you think of operators as types of functions, then that makes sense. If you had a class with operator= overloaded, your assignment operator would be equivalent to something like this:

 operator=(i, ++i+1)

(The first parameter is actually passed implicitly using the this pointer, but this is only for illustration.)

For a regular function call, this is obviously undefined. The value of the first argument depends on when the second argument is calculated. However, with primitive types, you avoid it because the original value of i simply overwritten; its value does not matter. But if you were doing some other magic in your own operator= , then the difference might look like.

Simply put: all operators act as functions and therefore must behave in accordance with the same concepts. If i + ++i is undefined, then i = ++i must also be undefined.

+3

int3 Dec 07 '09 at 15:29

source share

How about we all agree to never, never, write such code? If the compiler does not know what you want to do, how do you expect the bad juice to follow you to understand what you wanted to do? Room i ++; on his own line will not kill you.

+2

Ken Lange Dec 07 '09 at 15:46

source share

i = v [i ++]; // unspecified behavior
i = ++ i + 1; // unspecified behavior

All of the above expressions refer to Undefined Behavior.

i = 7, i ++, i ++; // i becomes 9

This is normal.

Read the answers to Steve Steve Summit's questions.

+1

Prasoon Saurav Dec 07 '09 at 15:14

source share

The main reason is how the compiler handles reading and writing values. The compiler is allowed to store the intermediate value in memory and only actually commit the value at the end of the expression. We read the ++i expression as “incrementing i by one and returning it,” but the compiler can see it as “loading the value of i , adding it, returning and fixing it in memory until someone uses it again. The compiler is advised to avoid reading / writing to the actual memory location, as far as possible, because it will slow down the program.

In the particular case of i = ++i + 1 it suffers largely because of the need for consistent behavioral rules. Many compilers will do the “right thing” in this situation, but what if one of i was actually a pointer pointing to i ? Without this rule, the compiler must be very careful to make sure that it performs the loads and repositories in the correct order. This rule allows you to increase optimization opportunities.

A similar case is the so-called strict smoothing rule. You cannot assign a value (for example, int ) through a value of an unrelated type (say, a float ) with a few exceptions. This does not allow the compiler to worry about using some float * , will change the value of int and significantly improve the optimization potential.

+1

coppro Dec 07 '09 at 15:31

source share

The problem is that the standard allows the compiler to completely reorder the instruction during its execution. However, it is not allowed to reorder statements (as long as any such reordering leads to a change in program behavior). Therefore, the expression i = ++i + 1; can be evaluated in two ways:

 ++i; // i = 2 i = i + 1;

or

 i = i + 1; // i = 2 ++i;

or

 i = i + 1; ++i; //(Running in parallel using, say, an SSE instruction) i = 1

This gets even worse when you have user-defined types created in a mix, where the ++ operator may have any effect on the type that the author of this type wants to receive, in which case the order used in the evaluation is of great importance .

+1

Billy ONeal Dec 07 '09 at 15:37

source share

From ++i , I have to assign "1", but with i = ++i + 1 it should be assigned the value "2". Since there is no intermediate point in the sequence, the compiler may assume that the same variable is not written twice, so these two operations can be performed in any order. so yes, the compiler will be correct if the final value is 1.

0

Mister Dec 07 '09 at 15:56

source share

Rob Kennedy · Accepted Answer · 2009-12-07 15:30

You make a mistake thinking of operator= as a function with two arguments , where the side effects of the arguments should be fully appreciated before the function starts. If so, then the expression i = ++i + 1 will have several points in the sequence, and ++i will be fully evaluated before the start of the assignment. However, this is not so. What is evaluated in the internal assignment operator, and not in the user-defined operator. There is only one point in the sequence in this expression.

The result of ++i is evaluated before the assignment (and before the addition operator), but the side effect is not necessarily applied immediately. The result of ++i + 1 always matches i + 2 , so the value that is assigned to i as part of the assignment operator. The result of ++i always i + 1 , so that is assigned to i as part of the increment operator. There is no sequence point for control whose value should be assigned first.

Since the code violates the rule that “between the previous and next point in the sequence, the scalar object must have its stored value, changed no more than once by evaluating the expression,” the behavior is undefined. In practice, however, it is likely that at first either i + 1 or i + 2 will be assigned, then a different value will be assigned, and finally, the program will continue to work, as usual - nose demons or exploding toilets, not i + 3 either.

Why the unspecified behavior of `i = ++ i + 1`?

Update for C ++ 11 (09/30/2011)

Straight from the mouth of the horse

Given two options: definite or undefined, what choice would you make?

More articles: