At what point in the loop does the integer overflow cycle become undefined?

Question

At what point in the loop does the integer overflow cycle become undefined?

This is an example illustrating my question, which includes much more complex code that I cannot post here.

#include <stdio.h> int main() { int a = 0; for (int i = 0; i < 3; i++) { printf("Hello\n"); a = a + 1000000000; } }

This program contains undefined behavior on my platform because a will overflow in the third loop.

Does the whole program matter undefined behavior or only after the actual overflow ? Can the compiler compromise that overflow a will be so that it can declare the whole cycle undefined and not try to run printfs, even if they all happen before the overflow?

_{(Tagged C and C ++, although different, because I will be interested in the answers for both languages, if they are different.)}

+84

c ++ c undefined-behavior integer-overflow

jcoder Oct 07 '16 at 10:12

source share

12 answers

First, let me correct the title of this question:

Undefined Behavior is not (specifically) a scope.

Undefined Behavior affects all stages: compilation, linking, loading, and execution.

Some examples for cementing this, keep in mind that none of the sections is exhaustive:

the compiler may assume that parts of the code containing Undefined Behavior are never executed, and therefore assume that the execution paths that lead to them are dead code. See What Every C Programmer Should Know About Undefined Behavior By Chris Lattern.
the linker may assume that if there are multiple weak character definitions (recognized by name), all definitions are identical using the same definition rule
the loader (in the case of using dynamic libraries) can take the same value, thus choosing the first character it finds; this is usually (ab) used to intercept calls using LD_PRELOAD tricks in Unixes
execution may fail (SIGSEV) if you use dangling pointers.

Here's what's so scary about Undefined Behavior: It is almost impossible to predict in advance what exact behavior will happen, and this prediction needs to be reviewed with every update of the toolchain that underlies the OS ...

I recommend watching this video Michael Spencer (LLVM Developer): CppCon 2016: My little optimizer: Undefined Behavior is magic .

+30

Matthieu M. 07 Oct. '16 at 13:44

source share

A persistently optimizing C or C ++ compiler focused on a 16-bit int will know that the behavior when adding 1000000000 to an int undefined.

It is either allowed by the standard to do anything, which may include deleting the entire program, leaving int main(){} .

But what about the big int s? I don’t know about a compiler that does this yet (and I am not an expert in developing a C and C ++ compiler by any means), but I believe that someday a compiler oriented to 32-bit int or higher will find out that the loop infinite ( i does not change), and therefore a will eventually overflow. So again, it can optimize the output to int main(){} . What I'm trying to do here is that as compiler optimizations become more and more aggressive, more and more undefined behavior constructs appear in unexpected ways.

The fact that your loop is infinite is not undefined in itself, as you write standard output in the body of the loop.

+28

Bathsheba Oct 07 '16 at 10:16

source share

Technically, according to the C ++ standard, if a program contains undefined behavior, the behavior of the entire program even at compile time (before the program is evenly executed) is undefined.

In practice, since the compiler can assume (as part of optimization) that overflow will not occur, at least the behavior of the program on the third iteration of the loop (assuming a 32-bit machine) will be undefined, although you will probably get the correct results before the third iteration. However, since the behavior of the entire program is technically undefined, there is nothing that would prevent the program from generating completely incorrect output (including without output), crashing at runtime at any time during execution, or even not compiling at all (how undefined behavior spreads at compile time).

Undefined behavior provides the compiler with more options for optimization, since it eliminates some assumptions about what the code should do. However, programs that rely on assumptions involving undefined behavior are not guaranteed as expected. Thus, you should not rely on any specific behavior that is considered undefined by the C ++ standard.

+11

bwDraco Oct 08 '16 at 18:41

source share

To understand why undefined behavior can travel in time, "as @TartanLlama adequately puts it , let's take a look at the as-if rule:

1.9 Program Execution
^{1 The} semantic descriptions in this International Standard define a parametrized non-deterministic abstract machine. This International Standard does not require a conformance structure for Implementation. In particular, they do not need to copy or emulate the structure of an abstract machine. Rather, appropriate implementations are required to emulate (only) the observed behavior of the abstract as described below.

With this, we could consider the program as a "black box" with input and output. Input data can be entered by the user, files and much more. The result is the "observed behavior" mentioned in the standard.

The standard defines only the mapping between input and output, nothing more. He does this by describing an “approximate black box,” but he clearly says that any other black box with the same display is equally important. This means that the contents of the black box does not matter.

With this in mind, it makes no sense to say that undefined behavior occurs at a particular moment. In the black box implementation example, we could say where and when this will happen, but the real black box may be something completely different, so we cannot say where and when it happens. Theoretically, the compiler could, for example, decide to list all possible inputs and pre-compute the resulting outputs. Then, during compilation, undefined behavior would be executed.

Undefined behavior is the lack of a mapping between input and output. A program may have undefined behavior for some input, but specific behavior for another. Then the mapping between input and output is simply incomplete; there is an input for which there is no display to output.
The program in question has undefined behavior for any input, so the display is empty.

+9

alain Oct 07 '16 at 20:09

source share

Assuming int is 32-bit, undefined behavior happens on the third iteration. So, if, for example, the cycle was only conditionally reachable or could conditionally be interrupted before the third iteration, there would be no undefined behavior if the third iteration were not achieved. However, in the case of undefined behavior, all output from the program is undefined, including output that is "in the past" regarding the call to undefined. For example, in your case, this means that there is no guarantee to see 3 "Hello" messages on the output.

+6

R .. 07 Oct '16 at 2:48

source share

Tartan Llam's answer is correct. Undefined behavior can occur at any time, even during compilation. This may seem absurd, but it is a key feature that allows compilers to do what they need to do. It is not always easy to be a compiler. You must do what the specifier says every time. However, it can sometimes be terribly difficult to prove that a certain behavior is occurring. If you remember the problem of stopping, it’s pretty trivial to develop software for which you cannot prove whether it completes or enters an infinite loop when you feed a certain input.

We could make the compilers pessimistic and constantly compile in fear that the following instruction may be one of such problems, such as problems, but this is not reasonable. Instead, we give the compiler a pass: on these topics, “undefined behavior”, they are exempted from any responsibility. undefined behavior consists of all behaviors that are so subtly hushed up that we have problems separating them from really unpleasant unpleasant problems with stopping and something else.

There is an example that I like to publish, although I admit that I lost the source, so I have to rephrase it. This was from a specific version of MySQL. In MySQL, they had a circular buffer that was filled with data provided by the user. Of course, they wanted to make sure that the data did not overflow the buffer, so they had a check:

 if (currentPtr + numberOfNewChars > endOfBufferPtr) { doOverflowLogic(); }

It looks healthy enough. However, what if numberOfNewChars is really large and crowded? Then it wraps around and becomes a pointer less than endOfBufferPtr , so the overflow logic will never be called. So they added a second check before it:

 if (currentPtr + numberOfNewChars < currentPtr) { detectWrapAround(); }

Looks like you took care of the buffer overflow error, right? However, an error was presented indicating that this buffer was full on a specific version of Debian! A thorough investigation revealed that this version of Debian was the first to use a particularly bleeding version of gcc. In this version of gcc, the compiler recognized that currentPtr + numberOfNewChars can never be a smaller pointer than currentPtr, because overflow for pointers is undefined behavior! That was enough for gcc to optimize the whole check, and suddenly you were not protected from buffer overflows, even though you wrote the code to check!

This is specification behavior. Everything was legal (although from what I heard, gcc rolled back this change in the next version). This is not what I consider intuitive behavior, but if you stretch your imagination a little, it is easy to see how a small version of this situation can become a problem for the compiler. Because of this, specification developers made this "undefined Behavior" and stated that the compiler can do absolutely anything.

+6

Cort Ammon Oct 07 '16 at 21:04

source share

Behavior

Undefined is, by definition, a gray area. You simply cannot predict what it will or will not do - which means "undefined behavior".

Since time immemorial, programmers have always tried to save the remnants of a certain certainty from an undefined situation. They have code that they really want to use, but that turns out to be undefined, so they try to say: "I know it is undefined, but, of course, it will do this or that in the worst case; it never will." And sometimes these arguments are more or less correct, but often they are mistaken. And as compilers get smarter and smarter (or maybe some people say sneakier and sneakier), the boundaries of the issue keep changing.

So, if you want to write code that is guaranteed to work, and it will work for a long time, there is only one choice: avoid undefined behavior at all costs. Verily, if you are in him, he will return to pursue you.

+4

Steve Summit Oct 08 '16 at 0:25

source share

In addition to the theoretical answers, a practical observation would be that for a long time compilers used various transformations on loops to reduce the amount of work done in them. For example, given:

 for (int i=0; i<n; i++) foo[i] = i*scale;

the compiler can convert this to:

 int temp = 0; for (int i=0; i<n; i++) { foo[i] = temp; temp+=scale; }

Thus, the preservation of multiplication with each iteration of the loop. An additional form of optimization that compilers have adapted with varying degrees of aggressiveness will turn this into:

 if (n > 0) { int temp1 = n*scale; int *temp2 = foo; do { temp1 -= scale; *temp2++ = temp1; } while(temp1); }

Even on machines with silent overflow flow, this can lead to malfunctions if there was a certain number less than n, which, if multiplied by the scale, would give 0. It can also turn into an infinite loop if the scale was read from memory more than ever then, and something unexpectedly changed its value (in any case, "scale" can change the average cycle without calling UB, the compiler will not be allowed to perform optimization).

While most of these optimizations will not have problems in cases where two short unsigned types are multiplied to get a value that is between INT_MAX + 1 and UINT_MAX, gcc has some cases where such multiplication in a loop can lead to the start of the loop. I did not notice that this behavior comes from the comparison instructions in the generated code, but this is observed in cases where the compiler uses overflow to conclude that the loop can execute no more than 4 or less times; by default it does not generate warnings in cases where some inputs will lead to the fact that UB and others will not, even if its outputs are ignored.

+4

supercat Oct 10 '16 at 15:27

source share

One thing your example does not consider is optimization. a set in a loop but never used, and the optimizer can solve this. Thus, it is completely legal for the optimizer to completely abandon a , in which case all undefined behavior disappears as a victim of boojum.

However, of course, this is undefined, because the optimization is undefined. :)

+1

Graham Oct 07 '16 at 16:40

source share

Since this question has a double label C and C ++, I will try to address both. C and C ++ use different approaches here.

In C, an implementation must be able to prove that undefined behavior will be invoked in order to process the entire program as if it had undefined behavior. In the example, the OPs for the compiler would seem trivial to prove this and, therefore, like this: if the entire program is not defined.

We can see this from defect report 109, which he says asks:

If, however, the C-standard recognizes the separate existence of "indefinite values" (whose simple creation does not imply completely "indefinite behavior"), then the person performing the compiler testing can write a test case, such as the following, and he can also expect (or may require) that the corresponding implementation should at least compile this code (and possibly also allow its execution) without “failure”.
 int array1[5]; int array2[5]; int *p1 = &array1[0]; int *p2 = &array2[0]; int foo() { int i; i = (p1 > p2); /* Must this be "successfully translated"? */ 1/0; /* Must this be "successfully translated"? */ return 0; } 
So the main question is: should this code be "successfully translated" (whatever that means)? (See the footnote attached to subparagraph 5.1.1.3.)

and the answer was:

Standard C uses the term "indefinitely evaluated" rather than "indefinite value." Using an indefinite valued entity leads to indeterminate behavior. The footnote to subparagraph 5.1.1.3 indicates that the implementation may arbitrarily perform any number of diagnostic operations while the actual program is still correctly translated. If an expression whose evocation leads to undefined behavior appears in a context where a constant expression is required, the containing program is not strictly appropriate. In addition, if any possible execution of this program leads to undefined behavior, this program will not be strictly consistent. An appropriate implementation should not translate a strictly appropriate program simply because some possible execution of this program will lead to undefined behavior. Since foo can never be called, the above example should be successfully translated using the appropriate implementation.

In C ++, the approach seems more relaxed and assumes that the program has undefined behavior regardless of whether the implementation can prove it statically or not.

We have [intro.abstrac] p5, which states:

The corresponding implementation executing a well-formed program should provide the same observable behavior as one of the possible executions of the corresponding instance of an abstract machine with the same program and with the same input. , - , , ( , ).

0

Shafik Yaghmour 10 . '18 17:28

source share

- ( ) :

Undefined - *. "-"!

( ) . , - volatile , .

: UB , , . , , .

( ):

, , , .
, - undefined, ( , undefined),.

, " , undefined", , , . < > , undefined, , , , , UB, , !

, , UB undefined. , , .

, , , , , , , . , , ; . The following is an example.

* : UB . , UB- , UB . , , . , . /.

, , foo , undefined, :

 printf("foo"); getchar(); *(char*)1 = 1;

, , foo UB ; "", UB "-".

getchar() , , , , foo , "un-doing".

, , (.. ). , printf , , ? ?

, , , , , UB .
, , , , .

, , printf , , , , printf , . , , , UB, , , "" - UB.

-one

Mehrdad 09 Oct '16 5:12

source share

TartanLlama · Accepted Answer · 2016-10-07 10:32

If you are interested in a purely theoretical answer, the C ++ standard allows undefined behavior to "travel in time":

[intro.execution]/5: corresponding implementation executing a well-formed program should provide the same observable behavior as one of the possible executions of the corresponding instance of an abstract machine with the same program and the same input. However , if any such execution contains an undefined operation, this International Standard does not require the implementation of this program to be executed with this input (even with respect to operations preceding the first undefined operation)

Thus, if your program contains undefined behavior, then the behavior of your entire program will be undefined.

At what point in the loop does the integer overflow cycle become undefined?

- ( ) :

Undefined - *. "-"!

, , UB undefined. , , .

More articles: