Why does adding beforefieldinit dramatically improve the execution speed of generic classes?

Question

Why does adding beforefieldinit dramatically improve the execution speed of generic classes?

I am working on a proxy server, and for generic classes with a reference type parameter, it was very slow. Especially for generic methods (about 400 ms versus 3200 ms for trivial generic methods that just returned null). I decided to try to understand how this would work if I rewrote the generated class in C #, and it performed much better, with the same performance as my non-generic class code.

Here is the C # class that I wrote :: (note that I changed the naming scheme, but not a hell of a lot) ::

namespace TestData { public class TestClassProxy<pR> : TestClass<pR> { private InvocationHandler<Func<TestClass<pR>, object>> _0_Test; private InvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>> _1_Test; private static readonly InvocationHandler[] _proxy_handlers = new InvocationHandler[] { new InvocationHandler<Func<TestClass<pR>, object>>(new Func<TestClass<pR>, object>(TestClassProxy<pR>.s_0_Test)), new GenericInvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>>(typeof(TestClassProxy<pR>), "s_1_Test") }; public TestClassProxy(InvocationHandler[] handlers) { if (handlers == null) { throw new ArgumentNullException("handlers"); } if (handlers.Length != 2) { throw new ArgumentException("Handlers needs to be an array of 2 parameters.", "handlers"); } this._0_Test = (InvocationHandler<Func<TestClass<pR>, object>>)(handlers[0] ?? _proxy_handlers[0]); this._1_Test = (InvocationHandler<Func<TestClass<pR>, pR, GenericToken, object>>)(handlers[1] ?? _proxy_handlers[1]); } private object __0__Test() { return base.Test(); } private object __1__Test<T>(pR local1) where T:IConvertible { return base.Test<T>(local1); } public static object s_0_Test(TestClass<pR> class1) { return ((TestClassProxy<pR>)class1).__0__Test(); } public static object s_1_Test<T>(TestClass<pR> class1, pR local1) where T:IConvertible { return ((TestClassProxy<pR>)class1).__1__Test<T>(local1); } public override object Test() { return this._0_Test.Target(this); } public override object Test<T>(pR local1) { return this._1_Test.Target(this, local1, GenericToken<T>.Token); } } }

This is compilation in release mode for the same IL as my generated proxy - this is the class that its proxy server is:

 namespace TestData { public class TestClass<R> { public virtual object Test() { return default(object); } public virtual object Test<T>(R r) where T:IConvertible { return default(object); } } }

There was one exception, I did not set the beforefieldinit attribute for the generated type. I just set the following attributes :: public auto ansi

Why has using beforefieldinit significantly improved performance?

(The only other difference was that I did not name my parameters, which really did not matter in the grand scheme of things. Method and field names are scrambled to avoid clashing with real methods. GenericToken and InvocationHandlers are implementation details that are not relevant to the argument.
The GenericToken is literally used as the only typed data holder, as it allows me to send a “T” to the handler

InvocationHandler is just a holder for the purpose of the delegate field, there is no actual implementation detail.

The GenericInvocationHandler uses a callsite technique such as DLR to rewrite the delegate as needed to handle various common arguments)

EDIT :: Here is a test harness ::

 private static void RunTests(int count = 1 << 24, bool displayResults = true) { var tests = Array.FindAll(Tests, t => t != null); var maxLength = tests.Select(x => GetMethodName(x.Method).Length).Max(); for (int j = 0; j < tests.Length; j++) { var action = tests[j]; Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < count; i++) { action(); } sw.Stop(); if (displayResults) { Console.WriteLine("{2} {0}: {1}ms", GetMethodName(action.Method).PadRight(maxLength), ((int)sw.ElapsedMilliseconds).ToString(), j); } GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); } } private static string GetMethodName(MethodInfo method) { return method.IsGenericMethod ? string.Format(@"{0}<{1}>", method.Name, string.Join<Type>(",", method.GetGenericArguments())) : method.Name; }

And in the test, I do the following:

 Tests[0] = () => proxiedTestClass.Test(); Tests[1] = () => proxiedTestClass.Test<string>("2"); Tests[2] = () => handClass.Test(); Tests[3] = () => handClass.Test<string>("2"); RunTests(100, false); RunTests();

Where the tests are Func<object>[20] , and proxiedTestClass is the class generated by my assembly, and handClass is the one that I generated manually. RunTests is called twice, once it “heats up” things and starts it again and prints it on the screen. I basically took this code from a post here, John Skeet.

+6

c # cil il reflection.emit

Michael b Jan 28 '13 at 20:41

source share

2 answers

First, if you want to know more about beforefieldinit , read John Skeet's article C # and beforefieldinit . Parts of this answer are based on this, and I will repeat the corresponding bits here.

Secondly, your code is very small, so the overhead will have a significant impact on your measurements. In real code, the impact will be much less.

Third, you do not need to use Reflection.Emit to determine if the class has a beforefieldint . You can disable this flag in C # by adding a static constructor (for example, static TestClassProxy() {} ).

Now what beforefieldinit does is that it determines when the type initializer (a method called .cctor ) is called. In terms of C #, the type of initializer contains all the static field initializers and the code from the static constructor, if any.

If you do not set this flag, the type initializer will be called when the class is instantiated or any of the static members of the class are created. (Taken from the C # specification, using the CLI specification here will be more accurate, but the end result will be the same. ^* )

This means that without beforefieldinit compiler is very limited when to call the type initializer, it cannot decide to name it a little earlier, even if it will be more convenient (and as a result faster code).

Knowing this, we can see what actually happens in your code. Static methods are problematic cases, because where a type initializer can be called. (The instance constructor is different, but you are not measuring it.)

I focused on the s_1_Test() method. And since I really don't need to do anything, I simplified it (to make the generated native code shorter):

 public static object s_1_Test<T>(TestClass<pR> class1, pR local1) where T:IConvertible { return null; }

Now let's look at the disassembly in VS (in Release mode), first without a static constructor, that is, with beforefieldinit :

 00000000 xor eax,eax 00000002 ret

Here the result is set to 0 (this is done in a somewhat confusing way for performance reasons ), and the method returns very simply.

What happens with a static static constructor (i.e. without beforefieldinit )?

 00000000 sub rsp,28h 00000004 mov rdx,rcx 00000007 xor ecx,ecx 00000009 call 000000005F8213A0 0000000e xor eax,eax 00000010 add rsp,28h 00000014 ret

This is much more complicated, the real problem is the call statement, which apparently calls a function that, if necessary, starts the type initializer.

I consider this a source of performance difference between the two situations.

The reason why the added validation is necessary is because your type is generic and you use it with a reference type as a type parameter. In this case, the JIT code for the different generic versions of your class is common, but the type initializer must be called for each generic version. Transferring static methods to another, non-generic type would be one way to solve the problem.

^* If you are not doing something crazy, like a method to call an instance on null using call (rather than callvirt , which throws for null ).

+4

svick Jan 29 '13 at 16:03

source share

Nikolay Khil · Accepted Answer · 2013-01-29T15:55:41+0000

As indicated in ECMA-335 (CLI cpecification) , Part I, Section 8.9.5:

The semantics of when and what triggers the execution of this type of initialization methods:
A type can have an initializer type or not.
A type can be defined as having relaxed semantics for its type initialization method (for convenience, we call it relaxed semantic BeforeFieldInit below ).
If BeforeFieldInit is checked , then the type initialization method is executed at, or ever before, by first accessing any static field for that type.
If BeforeFieldInit is not marked, then this type of initializer starts on (i.e. starts):
a. first access to any static field of this type or
b. first call of any static method of this type or
from. the first call of any instance or virtual method of this type if it is a value type or
e. the first call of any constructor for this type.

In addition, as you can see from Michael's code above, TestClassProxy has only one static field: _proxy_handlers . Please note that it is used only two times:

In instance constructor
And in the static field initializer itself

Therefore, when BeforeFieldInit is BeforeFieldInit , type-initializer will be called only once: in the instance constructor, right before the first access to _proxy_handlers .

But if you omit BeforeFieldInit , the CLR will place a call to the type initializer before each TestClassProxy's call to the static method, static field, etc.

In particular, the type initializer will be called each time the static methods s_0_Test and s_1_Test<T> called.

Of course, as stated in ECMA-334 (C # language specification) , section 17.11:

A static constructor for a non-general class is executed no more than once in a given application domain. A static constructor for a generic class declaration is executed no more than once for each closed constructed type constructed from a class declaration (§25.1.5).

But in order to guarantee this, the CLR must check (in a thread-safe manner) if the class is already initialized or not.

And these checks will reduce performance.

PS: You may be surprised that performance problems will disappear if you change s_0_Test and s_1_Test<T> to method instances.

Why does adding beforefieldinit dramatically improve the execution speed of generic classes?

More articles: