How to check that the behavior is undefined in c?

I know that the following is undefined because I'm trying to read and write the value of a variable in the same expression that

int a=5; a=a++; 

but if so, then why is the next piece of code not undefined

 int a=5; a=a+1; 

as here, I'm also trying to change the value of a and write to it at the same time.

Also explain why the standard does not reject or eliminate this undefined behavior, even though they know it is undefined?

+7
c undefined-behavior sequence-points
source share
5 answers

In short, you can find every specific behavior in the standard. Everything that is not indicated there, as defined, is undefined.

An intuitive explanation of your example:

 a=a++; 

You want to change the variable a twice in one expression.

 1) a= //first time 2) a++ //second time 

If you look here:

 a=a+1; 

You change the variable a only once:

 a= // (a+1) - doesn't change the value of a 

Why doesn't the standard define a=a++ behavior?

One possible reason: The compiler may perform optimizations. The more cases you define in the standard, the less freedom the compiler must optimize your code. Since different architectures may have different incremental instructions, the compiler will not use all processor instructions if they violate standard behavior. Or in some cases, the compiler may change the order of evaluation, but this restriction will cause the compiler to disable such optimizations if you want to change something twice.

+3
source share

why the following code fragment is not undefined

 int a=5; a=a+1; 

The standard states that

Between the previous and next points in the sequence, the object must have the changed value of the stored value no more than once by evaluating the expression. In addition, the previous value should only be accessed to determine the value to be stored.

In the case a = a + 1 ; a changes only once, and the previous value of a is only available to determine the value that should be stored in a .
Although in the case a=a++; , a changed more than once by the ++ operator in the a++ subexpression and the = operator when assigning the result to the left a . Now it is not determined which modification, either ++ or = , will go through first .

Almost the entire modern compiler with the -Wall flag will raise a warning when compiling the first fragment, for example:

 [Warning] operation on 'a' may be undefined [-Wsequence-point] 

Further reading: How can I understand complex expressions like the ones in this section without writing undefined ones?

+6
source share

The reason this is undefined is not because you read and write, it means you write twice.

a++ means reading a and increasing it after reading it, but we don’t know whether ++ will happen before assigning with = (in this case = will be overwritten with the old value of a) or after, in which case a will increase.

Just use a++; :)

a = a + 1 has no problem, since a is written only once.

+6
source share

The ++ operator will add it to a, that is, the variable a will become + 1. In fact, the following two statements are equal:

 a++; a = a + 1; 

The last statement a + 1 will not increase a - it will generate a result that has the value a + 1. If you want a to become + 1, you need to assign the result + 1 to s

 a = a + 1; 

The reason the first expression you made will not work is because you are writing something like

 a = (a = a + 1); 
+3
source share

Others have already talked about the details of your specific example, so I will add some general information and tools to help you catch undefined behavior.

There is no final tool or method to catch undefined behavior, so even if you use all of these tools, there is no guarantee that there is something undefined in your code. But IME they will understand quite a few common problems. I do not list standard software development best practices, such as unit-testing, that you should use anyway.

  • clang (-analyze) has several parameters that can help catch undefined behavior, both at compile time and at run time. He has -ftrapv, he has new support for the canary, its disinfectant for the address, --fcatch- undefined -behaviour, etc.

  • gcc also has several options to catch undefined behavior, such as mud flaps, its address sanitizer, and stack protector.

  • valgrind is a fantastic tool for finding undefined runtime related memory behavior.

  • frama-c is a static analysis tool that can find and visualize undefined behavior. The ability to find dead code (undefined behavior can often lead to the loss of other parts of the code) is a pretty useful tool for tracking potential security issues. frama-c has many more advanced features, but it can be harder to use than ...

  • Other commercial static analysis tools that can catch undefined behavior exist, for example, PVS-studio, klocwork, etc. Usually they are expensive.

  • Compilation with various compilers and for strange architectures. If you can, why not compile and run your code on an 8-bit AVR chip? Raspberry Pi (32-bit ARM)? Compile it in javascript using emscripten and run it in V8? Doing this tends to be a practical way of catching undefined behavior, which can lead to line malfunctions (but little or nothing to catch a hiding UB, which can, for example, cause security problems).

Now, regarding the ontological reasons why undefined behavior exists ... It mainly concerns performance and usability. Many things that are UB in C allow the compiler to optimize some things that other languages ​​cannot optimize. If you, for example, compare how java, python and C handle overflow of signed integer types, you can see that at one extreme end python completely defines it in a way convenient for the programmer - ints can actually become infinitely large. C at the other end of the spectrum leaves it undefined - your responsibility is never to overflow your integers. Java is somewhat inbetween.

But on the other hand, this means that in python it is not known what kind of work "int + int" will actually do when executed. It can execute many hundreds of instructions, make a circuit through the operating system to allocate some memory, etc. This is very bad if you care about performance, or rather, about constant performance. C at the other end of the spectrum allows the compiler to map the “+” to the main CPU instruction, which adds integers (if they exist). Of course, different CPUs can handle overflows in different ways, but since C leaves it undefined, that's fine - you, as a programmer, have to take care not to overflow your goals. This means that C gives the compiler the ability to compile your "int + int" operations with one machine instruction on almost all processors - something compilers can and can use.

Note that C does not guarantee that + actually maps directly to the native CPU instruction, it just leaves the compiler free to make it open this way - and, obviously, that any writer or compiler would be willing to take advantage. The Javas method for determining signed integer overflows is less unpredictable (in terms of performance) than pythons, but cannot cause + to turn into a single processor instruction in many types of processors where C would allow this.

Thus, C tries to embrace undefined behavior and chooses (consistent) speed and ease of implementation when other languages ​​choose security or predictable behavior (from the point of view of programmers). This is not necessarily a good solution, for example in terms of safety / security, but in what C means. It comes down to “knowing the appropriate tool at hand”, and there are definitely many cases where C predictability gives you absolutely necessary information.

+2
source share

All Articles