Access to the same memory location twice, UB or not?

In this topic, the highest answer received a lot of votes and even generosity. He offers the following algorithm:

void RemoveSpaces(char* source) { char* i = source; char* j = source; while(*j != 0) { *i = *j++; // UB? if(*i != ' ') i++; } *i = 0; } 

My reaction to the knee-jerk reaction was that this code causes undefined behavior because i and j point to the same place in memory, and an expression such as *i = *j++; , will access the same variable twice, for others than to determine what to store, without an intermediate point. Although they are two different variables, they initially point to the same memory location.

However, I'm not sure, since I do not quite understand how two disjoint accesses in the same memory location can do any harm in practice.

Am I claiming that this behavior is undefined? And if so, are there any examples of how to rely on such a UB can cause harmful behavior?


EDIT

The relevant part of standard C, denoting this as UB, is:

C99 6.5

Between the previous and the next point in the sequence, the object must have its stored value, changed no more than once by evaluating the expression. In addition, the previous value should only be read to determine the value to be stored.

C11 6.5

If the side effect of the scalar object is independent of another side effect on the same scalar object or the value of the calculation using the value of the same scalar object, the behavior is undefined. If there are several valid orders of expression subexpression, the behavior is undefined if such an Inconsistent side effect occurs in any of the orders.

The actual meaning of the text should be the same in both versions of the standard, but I find that C99 text is much easier to read and understand.

+5
source share
4 answers

There are two situations where access to the same object twice without an intermediate point in the sequence is undefined behavior:

  • If you change the same object twice. for instance

     int x = (*p = 1, 1) + (*p = 2, 100); 

    Obviously, you won’t know if it’s worth 1 or 2 after this, but the wording in the C standard says that this behavior is undefined, even if you write

     int x = (*p = 1, 1) + (*p = 1, 100); 

    therefore, keeping the same value twice will not save you.

  • If you modify an object but also read it without using the value read to determine the new value of the object. It means

     *p = *p + 1; 

excellent, because you read *p , you change *p , but you read *p to determine the value stored in * .

+6
source

There is no UB (this is even idiomatic C), because:

  • *i changes only once (in *i = )
  • j changes only once (in *j++ )

Of course, in the published code, i and j can point to the same place (and do it from the first pass), but ... they are still different variables. So, in the line *i = *j++; :

  • Addresses
  • read in both pointers ( i and j )
  • the previous value is read (* j ++) and used to determine the stored value
  • only j pointer changed
  • source changed using an unmodified pointer

This is finally not UB.


The following calls invoke UB:

 *i = *j++ + *j++; // UB j modified twice i = i++ + j; // UB i modified twice 
+3
source

I do not think this will cause UB. In my opinion, this is as good as saying

 int k=0; k=k; //useless but does no harm 

There would be no harm in reading data from memory, and then writing it to the same position.

0
source

Break the expression *i = *j++ . The priority order of the three operators: ++ (post-increment) is the highest, then the operator * (dereference of the pointer) and = the lowest.

So, j++ will be evaluated first (with a result equal to j and the effect of increasing j ). So the expression is equivalent

  temp = j++; *i = *temp; 

where temp is a temporary compiler that is a pointer. None of the two expressions here have undefined behavior. This means that the original expression does not have undefined behavior.

0
source

All Articles