In C99, is f () + g () undefined or just undefined?

Question

In C99, is f () + g () undefined or just undefined?

I used to think that on C99, even if the side effects of the functions f and g interfere, and although the expression f() + g() does not contain a sequence point, f and g contain some, so the behavior will be unspecified: either f () is called before g () or g () to f ().

I'm not sure anymore. What if the compiler builds functions (which the compiler can solve, even if functions are not declared inline ), and then reorders the instructions? Is it possible to get a result different from the previous two? In other words, is this behavior undefined?

This is not because I intend to write this, it is to choose the best label for such a statement in the static analyzer.

+53

c undefined-behavior sequence-points unspecified-behavior c99

Pascal Cuoq Oct. 16 '10 at

source share

3 answers

See Appendix C for a list of sequence points. Functional calls (the point between all evaluated arguments and the execution of the function transfer) are points in the sequence. As you said, it is not indicated which function is called first, but each of the two functions will either see all the side effects of the other, or nothing at all.

+14

R .. Oct 16 '10 at 10:28

source share

@dmckee

Well, this does not fit in the comments, but here is what:

First you write the correct static analyzer. A “fix” in this context means that it will not remain silent if there is something dubious regarding the analyzed code, so at this point you are fun to conflate undefined and unspecified behaviors. They are bad and unacceptable in critical code, and you correctly warn them both of them.

But you only want to warn once about a possible error, and also know that your analyzer will be evaluated in tests in terms of "accuracy" and "recall" in comparison with other, possibly incorrect analyzers, so you should not warn twice about one and the same problem ... Whether it's a true or false alarm (you don't know what you never know, otherwise it would be too easy).

So you want to issue one warning for

 *p = x; y = *p;

Since, as soon as p is a valid pointer in the first expression, it can be considered a valid pointer in the second expression. And without causing this, you will reduce your score in terms of accuracy metrics.

So, you will teach your analyzer to assume that p is a valid pointer, as soon as you warned about it for the first time in the above code, so you don't warn about it a second time. More generally, you will learn to ignore values (and execution paths) that correspond to what you have already warned about.

Then you realize that not many people write critical code, so you do other, easy tests for the rest of them, based on the results of the initial correct analysis. Say the program slicer C.

And you say “them”: you do not need to check all (possibly often false) alarms emanating from the first analysis. A sliced program behaves the same as the original program, if none of them starts. Slicer produces programs that are equivalent for a cut criterion for “defined” execution paths.

And users cheerfully ignore alarms and use the slicer.

And then you understand that there may be a misunderstanding. For example, most memmove implementations (you know, the one that processes overlapping blocks) actually cause unspecified behavior when called with pointers that don't point to the same block (comparing addresses that don't point to the same block). And your analyzer ignores both execution paths, since both of them are undefined, but in fact both execution paths are equivalent, and all is well.

Thus, there should not be any misunderstanding regarding the meaning of alarms, and if someone intends to ignore them, only error-free behavior undefined should be excluded.

And you are so much interested in distinguishing between undefined behavior and undefined behavior. No one can blame you for ignoring the latter. But programmers will write the first one without even thinking about it, and when you say that your slicer eliminates the "incorrect behavior" of the program, they will not feel the way they want.

And this is the end of the story, which definitely did not fit into the commentary. I apologize to those who read this far.

+1

Pascal Cuoq Oct 17 '10 at 3:21

source share

Jonathan Leffler · Accepted Answer · 2010-10-16 23:02

The expression f() + g() contains at least 4 points in the sequence; one before calling f() (after all zero of its arguments are evaluated); one before calling g() (after all zero of its arguments are evaluated); one as a call to f() returns; and one when the call to g() returned. In addition, two points in the sequence associated with f() meet either before or after two points in the sequence associated with g() . What you cannot say is what order the points of the sequence will have - that is, whether there are f-points in front of g-points or vice versa.

Even if the compiler has indexed the code, it must obey the "as if" rule - the code must behave as if the functions were not interleaved. This limits the potential for damage (provided that it does not work with an error).

Thus, the sequence in which f() and g() are evaluated is not defined. But everything else is pretty clean.

The supercat comment asks:

I would expect function calls in the source code to remain as sequence points, even if the compiler decides to embed them themselves. Does this mean that functions declared "inline" or the compiler gets extra latitude?

I believe the “as if” rule applies, and the compiler does not get extra latitude to omit the points of the sequence because it uses an explicitly inline function. The main reason to think that (too lazy to look for the exact wording in the standard) is that the compiler is allowed to inline or not inline the function in accordance with its rules, but the behavior of the program should not change (except for performance).

In addition, what about the sequence (a(),b()) + (c(),d()) ? Is it possible to execute c() and / or d() between a() and b() , or for a() or b() to execute between c() and d() ?

It is clear that a is executed before b, and c is executed before d. I believe that it is possible to execute c and d between a and b, although it is rather unlikely that the compiler would generate such code; similarly, a and b can be performed between c and d. And although I used "and" in "c and d", it can be "or", that is, any of these sequences of operations meets the restrictions:
- Definitely allowed
- Abcd
- CDAB
- Possibly allowed (keeps order ≺ b, c ≺ d)
- ACBD
- Acdb
- CADB
- Cabd
I believe that it covers all possible sequences. See also the chat between Jonathan Leffler and AnArrayOfFunctions - the bottom line is that AnArrayOfFunctions does not count, possibly allowed "sequences".

If such a thing were possible, it would mean a significant difference between built-in functions and macros.

There are significant differences between built-in functions and macros, but I do not think that the order in the expression is one of them. That is, any of the functions a, b, c or d can be replaced by a macro, and the same sequencing of macroparticles can occur. The main difference, it seems to me, is that with built-in functions there are guaranteed sequence points in function calls - as indicated in the main answer - as well as in comma operators. With macros, you lose function points in a sequence. (So, maybe this is a significant difference ...) However, in many respects the problem is more like questions about how many angels can dance on the head of a pin - in practice this is not very important. If someone presented me with the expression (a(),b()) + (c(),d()) in a code review, I would tell them to rewrite the code so that it makes it clear:

 a(); c(); x = b() + d();

And this suggests that on b() vs d() there is no requirement for critical ordering.

In C99, is f () + g () undefined or just undefined?

More articles: