C gives different results based on the level of optimization (new example)

Based on this very good blog post, “Strong addition situation” is pretty bad , I posted some of the code online for you, check it out:

http://cpp.sh/9kht (output changes between -O0 and -O2)

#include <stdio.h> long foo(int *x, long *y) { *x = 0; *y = 1; return *x; } int main(void) { long l; printf("%ld\n", foo((int *)&l, &l)); } 
  • Is there any undefined behavior here?

  • What happens inside when we choose the -O2 level?

+6
source share
2 answers
  • Yes, this program has undefined behavior due to alias rules based on aliases, which can be summed as "you cannot access a memory cell declared with type A, a pointer of type B, unless B is a pointer to a character type (e.g. unsigned char * ). " This is an approximation, but it is close enough for most purposes. Note that when A is a pointer to a character type, B cannot be anything else - yes, that means that the common idiom of accessing the byte buffer is four at a time via uint32_t* - this behavior is undefined (the blog post also affects this).

  • The compiler assumes when compiling foo that x and y may not point to the same object. It follows that writing through *y cannot change the value of *x , and it can simply return the known value *x , 0 without rereading it from memory. This only happens when optimization is turned on, since tracking that each pointer can and cannot point to it is expensive (therefore, compilation is slower).

    Please note that this is the situation "demons fly out of your nose": the compiler has the right to make the generated code for foo start with

     cmp rx, ry beq __crash_the_program ... 

    (and a tool like UBSan can do just that)

+12
source

In another way, the code (int *)&l says that referring to a pointer as a pointer to an int. It does not convert anything. So (int *) tells the compiler to let you pass a long * to a function waiting for int *. You lie to him. Internally, foo expects x to be a pointer to an int, but that is not the case. The memory layout is not what it should be. The results, as you see, are unpredictable.

In another note, I would never use l (ell) as a variable name. It is too easy to confuse with 1 (one). For example, what is it?

 int x = l; 
+1
source

All Articles