Why is lambda faster than IL injection of a dynamic method?

I just built a dynamic method - see below (thanks to other SO users). Func seems to have been created as a dynamic method with an injection of IL 2x slower than lambda.

Does anyone know why exactly?

(EDIT: It was released as an x64 version in VS2010. Run it from the console not from within Visual Studio F5.)

class Program { static void Main(string[] args) { var mul1 = IL_EmbedConst(5); var res = mul1(4); Console.WriteLine(res); var mul2 = EmbedConstFunc(5); res = mul2(4); Console.WriteLine(res); double d, acc = 0; Stopwatch sw = new Stopwatch(); for (int k = 0; k < 10; k++) { long time1; sw.Restart(); for (int i = 0; i < 10000000; i++) { d = mul2(i); acc += d; } sw.Stop(); time1 = sw.ElapsedMilliseconds; sw.Restart(); for (int i = 0; i < 10000000; i++) { d = mul1(i); acc += d; } sw.Stop(); Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds); } Console.WriteLine("\n{0}...\n", acc); Console.ReadLine(); } static Func<int, int> IL_EmbedConst(int b) { var method = new DynamicMethod("EmbedConst", typeof(int), new[] { typeof(int) } ); var il = method.GetILGenerator(); il.Emit(OpCodes.Ldarg_0); il.Emit(OpCodes.Ldc_I4, b); il.Emit(OpCodes.Mul); il.Emit(OpCodes.Ret); return (Func<int, int>)method.CreateDelegate(typeof(Func<int, int>)); } static Func<int, int> EmbedConstFunc(int b) { return a => a * b; } } 

Here is the conclusion (for i7 920)

 20 20 25 51 25 51 24 51 24 51 24 51 25 51 25 51 25 51 24 51 24 51 4.9999995E+15... 

==================================================== ============================

EDIT EDIT EDIT EDIT EDIT

Here's the proof that dhtorpe was right - a more complex lambda would lose its edge. Code for proof (this demonstrates that Lambda has exactly the same performance when injecting IL):

 class Program { static void Main(string[] args) { var mul1 = IL_EmbedConst(5); double res = mul1(4,6); Console.WriteLine(res); var mul2 = EmbedConstFunc(5); res = mul2(4,6); Console.WriteLine(res); double d, acc = 0; Stopwatch sw = new Stopwatch(); for (int k = 0; k < 10; k++) { long time1; sw.Restart(); for (int i = 0; i < 10000000; i++) { d = mul2(i, i+1); acc += d; } sw.Stop(); time1 = sw.ElapsedMilliseconds; sw.Restart(); for (int i = 0; i < 10000000; i++) { d = mul1(i, i + 1); acc += d; } sw.Stop(); Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds); } Console.WriteLine("\n{0}...\n", acc); Console.ReadLine(); } static Func<int, int, double> IL_EmbedConst(int b) { var method = new DynamicMethod("EmbedConstIL", typeof(double), new[] { typeof(int), typeof(int) }); var log = typeof(Math).GetMethod("Log", new Type[] { typeof(double) }); var il = method.GetILGenerator(); il.Emit(OpCodes.Ldarg_0); il.Emit(OpCodes.Ldc_I4, b); il.Emit(OpCodes.Mul); il.Emit(OpCodes.Conv_R8); il.Emit(OpCodes.Ldarg_1); il.Emit(OpCodes.Ldc_I4, b); il.Emit(OpCodes.Mul); il.Emit(OpCodes.Conv_R8); il.Emit(OpCodes.Call, log); il.Emit(OpCodes.Sub); il.Emit(OpCodes.Ret); return (Func<int, int, double>)method.CreateDelegate(typeof(Func<int, int, double>)); } static Func<int, int, double> EmbedConstFunc(int b) { return (a, z) => a * b - Math.Log(z * b); } } 
+15
Jun 13 2018-12-12T00:
source share
3 answers

Given that the performance difference only exists when running in release mode without an attached debugger, the only explanation I can think of is that the JIT compiler is able to create custom code optimizers for lambda expressions that it cannot execute for the emitted dynamic IL functions.

Compilation for release mode (optimization enabled) and work without an attached debugger, lambda sequentially 2 times faster than the generated dynamic IL method.

Running the same optimized version in version mode with a debugger connected to the process reduces lambda performance to a comparable or worse than the generated dynamic IL method.

The only difference between the two runs is the behavior of the JIT. When the process is being debugged, the JIT compiler suppresses a number of optimizations to generate its own codes in order to preserve its own IL instruction for comparisons of source code numbers and other correlations that would be destroyed by aggressive native instructions.

The compiler can only apply special optimizations to the case where the graph of input expressions (in this case, the IL code) matches certain very specific patterns and conditions. The JIT compiler clearly has special knowledge of the lambda expression IL code code template and emits a different code for lambda than for the "normal" IL code.

It is possible that your IL instructions do not exactly match the pattern that causes the JIT compiler to optimize the lambda expression. For example, your IL commands encode the value of B as an inline constant, while a similar lambda expression loads a field from an internal captured instance of a variable object. Even if your generated IL needs to mimic the captured C # compiler field pattern generated by the lambda IL expression, it may still not be โ€œclose enoughโ€ to get the same JIT method as the lambda expression.

As mentioned in the comments, this could be due to lambda attachment to eliminate return / refund overhead. If so, I would expect to see that this difference in performance disappears in more complex lambda expressions, since nesting is usually reserved only for the simplest expressions.

+2
Jun 14 2018-12-12T00:
source share

Constant 5 was the reason. Why is this so? Reason: When the JIT knows that the constant is 5, it does not generate the imul , but lea [rax, rax * 4] . This is a well-known build level optimization. But for some reason, this code ran slower. Optimization was pessimization.

And the C # compiler emitting the closure did not allow JIT to optimize the code in this way.

Evidence. Change the constant to 56878567 and change the performance. When checking out the JITed code, you can see that imul is being used now.

I managed to catch this by hard coding the constant 5 in lambda as follows:

  static Func<int, int> EmbedConstFunc2(int b) { return a => a * 5; } 

This allowed me to test JITed x86.

Sidenote: .NET JIT does not embed delegate calls in any way. Just mentioned it, because it falsely showed that it was in the comments.

Sidenode 2: To get the full level of JIT optimization, you need to compile it in release mode and start without a debugger application. The debugger prevents optimizations even in Release mode.

Sidenote 3: Although EmbedConstFunc contains a closure and will usually be slower than a dynamically generated method, the effect of this โ€œleaโ€ optimization does more damage and ultimately slower.

+11
Jun 14 2018-12-12T00:
source share

lambda is not faster than DynamicMethod. It is based on. However, the static method is faster than the instance method, but delegating creation for the static method is slower than delegating the creation of the instance method. The Lambda expression creates a static method, but uses it as an instance method, adding the value "Closure" as the first paameter. Delegate to the static "pop" stack method to get rid of an optional instance of "this" to "mov" to the real "IL body". in the case of delegation, for example, the IL body method directly hits. This is why the delegate of a hypothetical static method built using the lambda expression is faster (perhaps a side effect of sharing the delegate template code between the instance / static method)

The performance problem can be avoided by adding an unused first argument (for example, the Closure type) to DynamicMethod and calling CreateDelegate with an explicit target instance (zero can be used).

var myDelegate = DynamicMethod.CreateDelegate (MyDelegateType, null) as MyDelegateType;

http://msdn.microsoft.com/fr-fr/library/z43fsh67 (v = vs .110) .aspx

Tony thong

+4
Jul 05 '14 at 16:32
source share



All Articles