Will the C # compiler optimize this code?

Question

Will the C # compiler optimize this code?

I often come across this scenario. At first glance, I think: "This is bad coding, I execute the method twice and always get the same result." But, thinking about this, I have to wonder how the compiler is as smart as I can come to the same conclusion.

var newList = oldList.Select(x => new Thing { FullName = String.Format("{0} {1}", x.FirstName, x.LastName), OtherThingId = x.GetOtherThing() != null : x.GetOtherThing().Id : 0 // Might call x.GetOtherThing() twice? });

GetOtherThing compiler behavior depend on the contents of the GetOtherThing method? Suppose it looks like (something similar to my real code right now):

 public OtherThing GetOtherThing() { if (this.Category == null) return null; return this.Category.OtherThings.FirstOrDefault(t => t.Text == this.Text); }

This will result in very poorly processed asynchronous changes in any storage from which these objects originate, definitely return the same if they are executed twice in a row. But what if it looked like this (pointless example for the sake of argument):

 public OtherThing GetOtherThing() { return new OtherThing { Id = new Random().Next(100) }; }

Doing this twice in a row will create two different objects with different identifiers in all likelihood. What could the compiler do in these situations? Is this ineffective as it seems that I showed in my first listing?

Doing the work itself

~~I ran something very similar to this first list of codes and put a breakpoint in the GetOtherThing instance GetOtherThing .~~ ~~One breakpoint has been deleted.~~ ~~So it looks like the result is really cached.~~ ~~What happens in the second case, when a method can return something else every time?~~ ~~Will the compiler be optimized incorrectly?~~ ~~Are there any reservations as a result of what I found?~~

EDIT

This conclusion was invalid. See comments in response to @usrs.

+6

optimization compiler-construction c # .net

Andrew Feb 18 '14 at 20:51

source share

2 answers

Two compilers are discussed here: the C # compiler, which turns C # into IL, and the IL compiler, which turns IL into machine code, is called jitter because it happens Just In Time.

The Microsoft C # compiler, of course, does not do such an optimization. A method call is generated as method calls, the end of the story.

Jitter is allowed to perform the optimization you describe, provided that it cannot be detected. For example, suppose you had:

 y = M() != 0 ? M() : N()

and

 static int M() { return 1; }

Jitter is allowed to turn this program into:

 y = 1 != 0 ? 1 : N()

or for that matter

 y = 1;

Whether jitter is this or not is an implementation detail; you will have to ask the jitter expert if he really does this optimization if you are interested.

Similarly, if you have

 static int m; static int M() { return m; }

that jitter could optimize this in

 y = m != 0 ? m : N()

or even in:

 int q = m; y = q != 0 ? q : N();

because jitter is allowed to rotate two field reads in a line without an intermediate record into one field, read, provided that the field is not mutable. Again, whether he does it or not, this is an implementation detail; ask the jitter developer.

However, in the last example, jitter cannot overcome the second challenge because it has a side effect.

I ran something very similar to this first list of codes and set a breakpoint in the GetOtherThing instance method. One breakpoint has been deleted.

This is very unlikely. Almost all optimizations are turned off when you are debugging, just so that they are easier to debug. As Sherlock Holmes never said when you eliminate the unbelievable, the most likely explanation is that the original poster was wrong.

+11

Eric Lippert Feb 18 '14 at 22:12

source share

usr · Accepted Answer · 2014-02-18T21:00:02+0000

The compiler can only apply optimization if you cannot tell the difference. In your “random” example, you can clearly tell the difference. It cannot be "optimized" in this way. This will violate the C # specification. In fact, the specification does not say much about optimization. He just says that you should watch the program. In this case, he indicates that two random numbers must be drawn.

In the first example, this optimization could be applied. This will never happen in practice. Here are some things that make this difficult:

The data the query is running on can be changed by calling a virtual function, or your lambda ( t => t.Text == this.Text ) can change the list. Very insidious.
It can be modified by another thread. I am not sure what the .NET memory model says about this.
It can be changed by reflection.
It must be proven that the calculation will always return the same value. How would you prove it? You will need to analyze all the code that can be run. Includes virtual calls and data-dependent control flow.

All this should work through non-built-in methods and through assemblies.

The C # compiler cannot do this because it cannot look in mscorlib. A patch release can change mscorlib at any time.

JIT is a bad JIT (alas), and it is optimized for compilation speed (alas). He does not do that. If you doubt that the current JIT will do some advanced optimization or not, this is a safe bet that it will not.

Will the C # compiler optimize this code?

Doing the work itself

More articles: