Constants and compile-time estimation - why change this behavior

If you go to this video to Eric Lippert, he describes the change that has been made to the C # compiler, which displays the following code is invalid (apparently, before and including .NET 2 this code would have been compiled).

int y; int x = 10; if (x * 0 == 0) y = 123; Console.Write(y); 

Now I understand that it is obvious that any execution of the above code actually evaluates

 int y; int x = 10; y = 123; Console.Write(y); 

But I don’t understand why it is considered “desirable” to make the following code in compiled? IE: What are the risks that allow such conclusions to guide their course?

+7
source share
2 answers

I still find this question a bit confusing, but let me see if I can rephrase the question in a form that I can answer. First, let me re-formulate the background of the question:

In C # 2.0, this code:

 int x = 123; int y; if (x * 0 == 0) y = 345; Console.WriteLine(y); 

considered as if you wrote

 int x = 123; int y; if (true) y = 345; Console.WriteLine(y); 

which, in turn, is considered as:

 int x = 123; int y; y = 345; Console.WriteLine(y); 

This is a legal program.

But in C # 3.0, we took an interrupt to prevent this. The compiler no longer considers the condition to be “always true”, despite the fact that you and I know that this is always the case. Now we make this an illegal program, because the compiler believes that it does not know that the body of the "if" is always executed, and therefore does not know that the local variable y is always assigned before it is used.

Why is the correct behavior of C # 3.0?

This is correct because the specification states that:

  • a constant expression should contain only constants. x * 0 == 0 not a constant expression because it contains a non-constant term x .

  • The if consequence is always known if the condition is a constant expression equal to true .

Therefore, this code should not classify the consequence of the conditional operator as always achievable and therefore should not classify the local y as defined.

Why is it desirable that a constant expression contains only constants?

We want the C # language to be understood by its users and implemented correctly by the compiler authors. The requirement that the compiler make all possible logical conclusions about the values ​​of expressions works against these goals. It should be easy to determine if a given expression is a constant, and if so, what is its meaning. Simply put, a constant evaluation code needs to know how to do arithmetic, but it doesn't need to know the facts about arithmetic manipulations. A constant evaluator knows how to multiply 2 * 1, but one does not need to know that "1 is a multiplicative identity over integers."

Now it is possible that the author of the compiler can decide that there are areas in which they can be smart, and thereby generate more optimal code. Compiler authors are allowed to do this, but not in such a way as to alter whether the code is legal or illegal. They are allowed to do optimizations to improve the compiler’s performance when providing legal code.

How did the error occur in C # 2.0?

What happened, the compiler was written to start the arithmetic optimizer too early. The optimizer is a bit that should be smart, and it should run after the program has been determined to be legitimate. It worked before the program was determined to be legal, and therefore influenced the result.

This was a potential break change: although it brought the compiler into compliance with the specification, it also potentially turned the working code into an error code. What prompted the change?

LINQ functions and, in particular, expression trees. If you said something like:

 (int x)=>x * 0 == 0 

and converted to an expression tree, you expect to generate an expression tree for

 (int x)=>true 

? Probably no! You probably expected it to create an expression tree for "multiply x by zero and compare the result with zero." Expression trees must preserve the logical structure of the expression in the body.

When I wrote the code for the expression tree, it’s not yet clear whether the project committee will decide

 ()=>2 + 3 

going to generate an expression tree for "add two-three" or an expression tree for "five". We determined the latter - the constants add up before generating expression trees, but arithmetic should not be run through the optimizer before generating expression trees.

So, now consider the dependencies we just declared:

  • Arithmetic optimization should occur before codegen.
  • Reorganization of the expression tree must occur before arithmetic optimization
  • Constant folding should occur before overwriting the expression tree
  • Constant coagulation must occur before flow analysis.
  • Flow analysis must occur before the expression tree is transformed (because we need to know if the expression tree uses uninitialized local)

We must find an order to carry out all this work, in that it complies with all these dependencies. The compiler in C # 2.0 made them in the following order:

  • constant folding and arithmetic optimization at the same time
  • flow analysis
  • Codegen

Where can I transform an expression tree? Nowhere! And, obviously, this is a buggy, because the flow analysis now takes into account the facts deduced by the arithmetic optimizer. We decided to redesign the compiler to do everything in order:

  • permanent fold
  • flow analysis
  • expression tree conversion
  • arithmetic optimization
  • Codegen

This obviously requires a break.

Now I decided to keep the existing broken behavior by doing the following:

  • permanent fold
  • arithmetic optimization
  • flow analysis
  • arithmetic de-optimization
  • expression tree conversion
  • arithmetic optimization again
  • Codegen

If the optimized arithmetic expression contains a pointer back to its non-optimized form. We decided that it was too difficult to save the error. We decided that it would be better to fix the error instead, accept the broken change, and simplify our understanding of the compiler architecture.

+8
source

The specification states that the specific assignment of what is assigned only inside the if block is not defined. The specification says nothing about compiler magic, which removes an unnecessary if block. In particular, this leads to a very confusing error message when the if condition changes and suddenly an error message y appears that is not assigned. "I did not change when y is assigned!".

The compiler is free to perform any obvious removal of the code that it wants, but first must follow the specification of the rules.

In particular, section 5.3.3.5 (MS 4.0 specification):

5.3.3.5 If statements For the if if statement:

if ( expr ) then-stmt else else-stmt

  • v has the same defined assignment state at the beginning of expr as at the beginning of stmt.
  • If v is definitely assigned at the end of the expr expression, then it is definitely assigned when passing the control flow to then-stmt and to else-stmt or to the endpoint of stmt if there is no else clause.
  • If v has a state “definitely assigned after the true expression” at the end of the expr expression, then it is definitely assigned to pass the control flow to then-stmt and is not definitely assigned to pass the control flow to else-stmt or to the end point stmt if there is no else clause .
  • If v has a state “definitely assigned after a false expression” at the end of the expr expression, then it is definitely assigned when the control flow is passed to else-stmt and not definitely assigned when the control flow is passed to then-stmt. It is definitely assigned at the endpoint stmt if and only if it is definitely assigned at the endpoint then-stmt.
  • Otherwise, v is considered to be specifically assigned to transmit the control flow, either then-stmt or else-stmt, or to the endpoint stmt if no else

In order for an initially unassigned variable to be considered specifically assigned in a particular place, the assignment of the variable must occur in every possible execution path leading to that location.

technically, the execution path exists where the if condition is false; if y also assigned to else , then fine, but ... the specification clearly does not require a definition of the if condition, is always true.

+3
source

All Articles