Where are common methods stored?

I read some information about generics in. And noticed one interesting thing.

For example, if I have a general class:

class Foo<T> { public static int Counter; } Console.WriteLine(++Foo<int>.Counter); //1 Console.WriteLine(++Foo<string>.Counter); //1 

The two classes Foo<int> and Foo<string> are different at runtime. But what about the case where a non-generic class has a generic method?

 class Foo { public void Bar<T>() { } } 

Obviously, there is only one class Foo . But what about the Bar method? All common classes and methods are closed at runtime with the parameters with which they were associated. Does this mean that the Foo class has many Bar implementations and where is the information about this method stored in memory?

+53
generics c #
Jan 04 '17 at 10:29 on
source share
3 answers

Unlike C ++ templates , generic .NET files are evaluated at runtime, not at compile time. Semantically, if you create an instance of a general class with different type parameters, they will behave as if they were two different classes, but under the hood there is only one class in the compiled IL code (intermediate language).

Common types

The difference between different instances of the same generic type becomes apparent when you use Reflection : typeof(YourClass<int>) will not be the same as typeof(YourClass<string>) . They are called constructed type types . There is also typeof(YourClass<>) , which represents a generic type definition. Here are some additional tips for working with generics through Reflection.

When you create the created general class , the runtime generates the specialized class on the fly. There are subtle differences between how it works with values ​​and reference types.

  • The compiler will only generate one common type in the assembly.
  • The runtime creates a separate version of your generic class for each type of value with which you use it.
  • Runtime allocates a separate set of static fields for each parameter of the type of the general class.
  • Because the reference types are the same size, the runtime can reuse the custom version that it generated when it was first used with the reference type.

General methods

For general methods, the principles are the same.

  • The compiler generates only one common method, which is the definition of a universal method .
  • At run time, each different specialization of a method is considered a different method of the same class.
+50
Jan 04 '17 at 10:37
source share

First of all, let's clarify two things. This is the definition of a general method:

 T M<T>(T x) { return x; } 

This is a generic type definition:

 class C<T> { } 

Most likely, if I ask you what M , you will say that this is a general method that takes T and returns a T This is absolutely correct, but I suggest another way to think about it - there are two sets of parameters. One of them is a type T , the other is an object x . If we combine them, we know that in aggregate this method takes only two parameters.




The currying concept tells us that a function that takes two parameters can be converted to a function that takes one parameter and returns another function that takes another parameter (and vice versa). For example, here is a function that takes two integers and returns their sum:

 Func<int, int, int> uncurry = (x, y) => x + y; int sum = uncurry(1, 3); 

And here is the equivalent form, where we have a function that takes one integer and produces a function that takes another integer and returns the sum of these above integers:

 Func<int, Func<int, int>> curry = x => y => x + y; int sum = curry(1)(3); 

We switched from one function that takes two integers to have a function that takes an integer and creates functions. Obviously, these two are not literally the same in C #, but they are two different ways of saying the same thing, because passing the same information will ultimately lead you to the same final result.

Currying makes it easier for us to talk about functions (it's easier to talk about one parameter than two), and this allows us to know that our conclusions are still relevant for any number of parameters.




Consider for a moment that on an abstract level this is what happens here. Let's say M is a “superfunction” that takes type T and returns a regular method. This return method takes a value of T and returns a value of T

For example, if we call the superfunction M with an int argument, we get a regular method from int to int :

 Func<int, int> e = M<int>; 

And if we call this ordinary method argument 5 , we get a 5 back, as expected:

 int v = e(5); 

So, consider the following expression:

 int v = M<int>(5); 

Now you see why this can be considered as two separate calls? You can recognize a call to a superfunction because its arguments are passed to <> . This is followed by a call to the return method, where the arguments are passed to () . This is similar to the previous example:

 curry(1)(3); 

And similarly, defining a generic type is also a superfunction that takes a type and returns a different type. For example, List<int> is a call to the List super function with an int argument that returns a type containing a list of integers.

Now that the C # compiler encounters a regular method, it compiles it like a regular method. He does not try to create different definitions for different possible arguments. So this is:

 int Square(int x) => x * x; 

compiles as is. It does not compile as:

 int Square__0() => 0; int Square__1() => 1; int Square__2() => 4; // and so on 

In other words, the C # compiler does not evaluate all possible arguments for this method to embed them in the final exacutable - rather, it leaves the method in its parameterized form and hopes that the result will be evaluated at runtime.

Similarly, when a C # compiler encounters a superfunction (a general method or type definition), it compiles it as a superfunction. He does not try to create different definitions for different possible arguments. So this is:

 T M<T>(T x) => x; 

compiles as is. It does not compile as:

 int M(int x) => x; int[] M(int[] x) => x; int[][] M(int[][] x) => x; // and so on float M(float x) => x; float[] M(float[] x) => x; float[][] M(float[][] x) => x; // and so on 

Again, the C # compiler trusts that when this superfunction is called, it will be evaluated at runtime, and a regular method or type will be created from this evaluation.

This is one of the reasons why C # benefits from having a JIT compiler as part of its runtime. When a superfunction is evaluated, it produces a completely new method or type that was not at compile time! We call this reification process. Subsequently, the runtime remembers this result, so it does not have to re-create it again. This part is called memoization .

Compare with C ++, which does not require a JIT compiler as part of its runtime. The C ++ compiler really needs to evaluate superfunctions (called "templates") at compile time. This is a possible option because the arguments to superfunctions are limited to things that can be evaluated at compile time.




So, to answer your question:

 class Foo { public void Bar() { } } 

Foo is a regular type, and only one of them. Bar is a regular method inside Foo and there is only one of them.

 class Foo<T> { public void Bar() { } } 

Foo<T> is a superfunction that creates types at runtime. Each of these resulting types has its own regular method named Bar and only one of them (for each type).

 class Foo { public void Bar<T>() { } } 

Foo is a regular type, and only one of them. Bar<T> is a superfunction that creates regular methods at runtime. Each of these resulting methods will then be considered part of the regular Foo type.

 class Foo<Τ1> { public void Bar<T2>() { } } 

Foo<T1> is a superfunction that creates types at runtime. Each of these resulting types has its own superfunction called Bar<T2> , which creates regular methods at runtime (later). Each of these resulting methods is considered part of the type that created the corresponding superfunction.




The above is a conceptual explanation. In addition, some optimizations can be implemented to reduce the number of different implementations in memory - for example, two constructed methods can share a single implementation of machine code under certain circumstances. See the Luaan answer about why the CLR can do this and when it really does.

+30
Jan 04 '17 at 12:32
source share

In the IL itself, there is only one “copy” of code, as in C #. Generics are fully supported by IL, and the C # compiler does not need to do any tricks. You will find that each override of a generic type (for example, List<int> ) has a separate type, but they still retain a reference to the original public generic type (for example, List<> ); however, at the same time, according to the contract, they must behave as if there were separate methods or types for each closed pedigree. Thus, the simplest solution is for each closed general method to be a separate method.

Now for implementation details :) In practice, this is rarely necessary and can be expensive. So what actually happens is that if one method can handle several type arguments, it will. This means that all reference types can use the same method (type safety is already determined at compile time, so there is no need to have it again at run time), and with a little tricking with static fields you can use the same "type". For example:

 class Foo<T> { private static int Counter; public static int DoCount() => Counter++; public static bool IsOk() => true; } Foo<string>.DoCount(); // 0 Foo<string>.DoCount(); // 1 Foo<object>.DoCount(); // 0 

There is only one build method for IsOk , and it can be used by both Foo<string> and Foo<object> (which, of course, also means that calls to this method can be the same). But their static fields are still separate, as required by the CLI specification, which also means that DoCount must reference two separate fields for Foo<string> and Foo<object> . And yet, when I do the disassembly (on my computer, mind you), these are implementation details and can vary a lot, and it also takes a bit of effort to prevent inlining DoCount ), there is only one DoCount method. How? Link to Counter indirect:

 000007FE940D048E mov rcx, 7FE93FC5C18h ; Foo<string> 000007FE940D0498 call 000007FE940D00C8 ; Foo<>.DoCount() 000007FE940D049D mov rcx, 7FE93FC5C18h ; Foo<string> 000007FE940D04A7 call 000007FE940D00C8 ; Foo<>.DoCount() 000007FE940D04AC mov rcx, 7FE93FC5D28h ; Foo<object> 000007FE940D04B6 call 000007FE940D00C8 ; Foo<>.DoCount() 

And the DoCount method looks something like this (excluding the prolog and "I don't want to embed this method" filler):

 000007FE940D0514 mov rcx,rsi ; RCX was stored in RSI in the prolog 000007FE940D0517 call 000007FEF3BC9050 ; Load Foo<actual> address 000007FE940D051C mov edx,dword ptr [rax+8] ; EDX = Foo<actual>.Counter 000007FE940D051F lea ecx,[rdx+1] ; ECX = RDX + 1 000007FE940D0522 mov dword ptr [rax+8],ecx ; Foo<actual>.Counter = ECX 000007FE940D0525 mov eax,edx 000007FE940D0527 add rsp,30h 000007FE940D052B pop rsi 000007FE940D052C ret 

Thus, the code basically “injects” the dependency Foo<string> / Foo<object> , so when the calls are different, the called method is actually the same - only with a bit more indirectness. Of course, for our original method ( () => Counter++ ) this will not be a call at all and will not have additional indirectness - it will just be built into callsite.

This is a bit trickier for value types. Fields of reference types are always the same size - the size of the link. On the other hand, value type fields can have different sizes, for example. int vs long or decimal . Indexing an array of integers requires a different assembly than indexing an array of decimal s. And since structures can also be shared, the size of the structure can depend on the size of arguments of the type:

 struct Container<T> { public T Value; } default(Container<double>); // Can be as small as 8 bytes default(Container<decimal>); // Can never be smaller than 16 bytes 

If we add value types to our previous example

 Foo<int>.DoCount(); Foo<double>.DoCount(); Foo<int>.DoCount(); 

We get this code:

 000007FE940D04BB call 000007FE940D00F0 ; Foo<int>.DoCount() 000007FE940D04C0 call 000007FE940D0118 ; Foo<double>.DoCount() 000007FE940D04C5 call 000007FE940D00F0 ; Foo<int>.DoCount() 

As you can see, while we do not get additional indirectness for static fields, unlike reference types, each method is actually completely separate. The code in the method is shorter (and faster), but it cannot be reused (this is for Foo<int>.DoCount() :

 000007FE940D058B mov eax,dword ptr [000007FE93FC60D0h] ; Foo<int>.Counter 000007FE940D0594 lea edx,[rax+1] 000007FE940D0597 mov dword ptr [7FE93FC60D0h],edx 

Just simple access to the static field, as if the type were not generic at all - as if we just defined class FooOfInt and class FooOfDouble .

In most cases, this is not important to you. Well-designed generics are usually more than paying their costs, and you can't just make a flat expression about the performance of generics. Using List<int> will almost always be better than using ArrayList ints - you pay extra memory cost for several List<> methods, but if you don't have many different types of List<> values ​​without elements, the savings will likely outweigh the cost as in memory and in time. If you have only one redefinition of a certain type (or all versions are closed for reference types), you usually will not pay extra - there may be a little extra treatment if attachment is not possible.

There are several recommendations for the effective use of generics. The most relevant here is the preservation of only common generic parts. Once the contained type is shared, everything inside can also be shared - therefore, if you have 100 kiB of static fields in a generic type, each confirmation will need to be duplicated. It may be what you want, but it may be a mistake. The usual aproach is to put non-common parts in a nonequivalent static class. The same applies to nested classes - class Foo<T> { class Bar { } } means that Bar also a general class (it "inherits" an argument of the type of its containing class).

On my computer, even if I keep the DoCount method free from anything in common (replace Counter++ only 42 ), the code is the same - compilers do not try to eliminate unnecessary “commonality.” "If you need to use many different confirmations of the same type, this can compose quickly - so consider storing these methods separately; putting them in a non-generic base class, or a static extension method might be useful. But as always with a performance characteristic. This is probably not a problem.

+15
Jan 04 '17 at 12:53 on
source share



All Articles