This will mainly apply to an asp.net application where data is not accessible through soa. This means that you are accessing objects downloaded from the framework, not transfer objects, although some recommendations still apply.
This is a community post, so please add it as you see fit.
Applies to : Entity Framework 1.0 ships with Visual Studio 2008 sp1.
Why choose EF first?
Given that this is a young technology with a lot of problems (see below), it can be a difficult sale to get to the EF winner for your project. However, this is Microsoft's technology pushing (due to Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reason, there are people (including me) working with EF, and life is not bad. Think about it.
EF and inheritance
The first major item is inheritance. EF supports matching for inherited classes, which are stored in two ways: a table for each class and a hierarchy table. Modeling is very simple, and in this part there are no programming problems.
(The following applies to a table for a class model, since I have no experience working with a table on a hierarchy, which is limited anyway). The real problem arises when you try to run queries that include one or more objects that are part of the inheritance tree: the generated sql is incredibly horrible, takes a long time to get EF parsing, and it takes a lot of time to execute. This is a real show stopper. It is enough that EF should probably not be used with inheritance or as little as possible.
Here is an example of how bad it was. My EF model had ~ 30 classes, ~ 10 of which were part of the inheritance tree. When you run a query to retrieve one element from a base class, as simple as Base.Get (id), the generated SQL was more than 50,000 characters. Then, when you try to return some associations, it degenerates even more, and it throws SQL exceptions due to the inability to query more than 256 tables at once.
Well, that’s bad, the EF concept allows you to create your own object structure without (or as little as possible) considering the actual implementation of your table in the database. This completely fails.
So recommendations? Avoid inheritance, if you can, performance will be much better. Use it sparingly where you need it. In my opinion, this makes EF a renowned tool for generating sql queries for queries, but there are still advantages to using it. And ways to implement a mechanism similar to inheritance.
Interface Inheritance Bypass
The first thing to know when trying to get some kind of legacy coming from EF is that you cannot assign a non-EF class to a base class. Do not even try, it will be overwritten by the moderator. So what to do?
You can use interfaces to ensure that these classes implement certain functions. For example, there is an IEntity interface that allows you to define associations between EF objects during which you do not know what the type of the object will be.
public enum EntityTypes{ Unknown = -1, Dog = 0, Cat } public interface IEntity { int EntityID { get; } string Name { get; } Type EntityType { get; } } public partial class Dog : IEntity {
Using this IEntity, you can work with undefined associations in other classes
which uses some extension functions:
public class IEntityController { static public IEntity Get(int id, EntityTypes type) { switch (type) { case EntityTypes.Dog: return Dog.Get(id); case EntityTypes.Cat: return Cat.Get(id); default: throw new Exception("Invalid EntityType"); } } }
Not as neat as having simple inheritance, especially considering that you should store PetType in an extra database field, but given the performance boost, I would not look back.
He also cannot model one-to-many, many-to-many relationships, but with the creative use of Union, he can be made to work. Finally, it creates a side effect of loading data into the property / function of the object, which you should be careful about. Using a clear naming convention such as GetXYZ () helps in this regard.
Compiled Queries
The performance of the Entity Framework is not as good as direct access to the database with ADO (obviously) or Linq2SQL. However, there are ways to improve it, one of which is to compile your queries. Compiled query performance is similar to Linq2Sql.
What is a compiled query? This is just a query for which you are telling the infrastructure to keep the syntax tree in memory, so it does not need to be restored the next time it starts. So, in the next run you will save time spent on parsing the tree. Do not discount this, as it is a very expensive operation that gets even worse with more complex queries.
There are two ways to compile a query: creating ObjectQuery with EntitySQL and using the CompiledQuery.Compile () function. (Note that with the EntityDataSource on your page, you will actually use ObjectQuery with EntitySQL to compile and cache).
Drop it here if you don't know what EntitySQL is. This is a string way to write requests to EF. Here is an example: "select dog from Entities.DogSet as dog, where dog.ID = @ID." The syntax is pretty similar to SQL syntax. You can also perform rather complex manipulations with the object, which is well explained here [1].
So, here is how to do it using ObjectQuery <>
string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault();
When this query is first run, the structure will create an expression tree and store it in memory. Therefore, the next time it is completed, you will save on this expensive step. In this example, EnablePlanCaching = true, which is optional since this is the default setting.
Another way to compile a query for later use is the CompiledQuery.Compile method. This uses the delegate:
static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => ctx.DogSet.FirstOrDefault(it => it.ID == id));
or using linq
static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault());
to invoke the request:
query_GetDog.Invoke( YourContext, id );
The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where EntitySQL does not exist. However, there are other considerations ...
Includes
Suppose you want the data for the dog owner to be returned by a query to avoid making 2 calls to the database. Easy to do, right?
EntitySQL
string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner"); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault();
Compiledquery
static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault());
Now, if you want the Include parameter to be parameterized? I mean, you want to have a single Get () function that is called from different pages that take care of different relationships for the dog. One takes care of the Owner, the other of his Favorite Food, the other of his FavotireToy, and so on. Basically you want to indicate what associations to load.
It is easy to do with EntitySQL.
public Dog Get(int id, string include) { string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)) .IncludeMany(include); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); }
The include simply uses the passed string. Easy enough. Note that you can improve the Include (string) function (which takes only one path) with IncludeMany (string), which allows you to pass a comma-separated chain of associations to load. See further in the extension section of this function.
If we try to do this with CompiledQuery, we will encounter numerous problems:
Evident
static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault());
will throttle when called using:
query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" );
Since, as mentioned above, Include () wants to see only one path in the line, and here we give it 2: "Owner" and "FavoriteFood" (which should not be confused with "Owner.FavoriteFood" ,!).
Then use the IncludeMany () function, which is an extension function
static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault());
Wrong, this time because EF cannot parse IncludeMany because it is not part of the functions that are recognized: it is an extension.
So, you want to pass an arbitrary number of paths to your function, and Includes () - only one. What to do? You can decide that you will never need more than, say, 20 inclusions, and pass each split line to a struct in CompiledQuery. But now the request looks like this:
from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3) .Include(include4).Include(include5).Include(include6) .[...].Include(include19).Include(include20) where dog.ID == id select dog
which is terrible too. OK, then wait, wait. Can't we return ObjectQuery <> with CompiledQuery? Then set for this inclusion? Well, what would I think like this:
static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) => (ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog)); public Dog GetDog( int id, string include ) { ObjectQuery<Dog> oQuery = query_GetDog(id); oQuery = oQuery.IncludeMany(include); return oQuery.FirstOrDefault; }
This should have worked, except that when you call IncludeMany (or Include, Where, OrderBy ...), you invalidate the cached compiled request because it is completely new! So, the expression tree needs to be rewritten, and you get this performance again.
So what is the solution? You simply cannot use CompiledQueries with parameterized Includes. Use EntitySQL instead. This does not mean that they are not used for CompiledQueries. This is great for localized queries that will always be called in the same context. Ideally, CompiledQuery should always be used because the syntax is checked at compile time, but due to limitations it is not possible.
An example of a use might be: you might want to have a page that sets up two dogs that have the same favorite food, which is a bit narrow for the BusinessLayer function, so you put it on your page and know exactly what type of inclusions.
Passing more than 3 parameters in CompiledQuery
Func is limited to 5 parameters, the last of which is the return type, and the first is your Entities object from the model. Thus, you will get 3 parameters. Pete, but it can be improved very easily.
public struct MyParams { public string param1; public int param2; public DateTime param3; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog); public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate ) { MyParams myParams = new MyParams(); myParams.param1 = name; myParams.param2 = age; myParams.param3 = birthDate; return query_GetDog(YourContext,myParams).ToList(); }
Return types (this does not apply to EntitySQL queries since they are not compiled at the same time at runtime as a CompiledQuery method)
When working with Linq, you usually do not force the query to run until the very last moment, if some other functions downstream want to somehow modify the query:
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public IEnumerable<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name); } public void DataBindStuff() { IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud");
What will be here? Still playing with the original ObjectQuery (this is the actual return type of the Linq operator that implements IEnumerable), it will invalidate the compiled query and lead to reanalysis. So, a rule of thumb is to return List <> objects instead.
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public List<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name).ToList();
When you call ToList (), the request is executed according to the compiled request, and then, later, OrderBy is executed against objects in memory. It might be a little slower, but I'm not even sure. The compelling thing is that you don’t worry about the incorrect handling of ObjectQuery and the invalidity of the compiled query plan.
Again, this is not a complete application. ToList () is a protective software trick, but if you have a good reason not to use ToList (), continue. There are many cases when you want to clarify a request before executing it.
Performance
What is the impact of performance on query compilation? It can be pretty big. The rule of thumb is that compiling and caching a query for reuse takes at least twice the time of simple execution without caching. For complex inquiries (read inherirante) I saw up to 10 seconds.
So, the first time you call a programmed query, you get a performance hit. After the first hit, performance is noticeably better than one that has not been compiled. Almost the same as Linq2Sql
When you load a page with precompiled requests, you get a hit for the first time. It will load, possibly in 5-15 seconds (obviously, more than one pre-compiled request will be called), and subsequent loads take less than 300 ms. It’s a dramatic difference, and it’s up to you to decide whether it’s normal for your first user to hit or if you want the script to call your pages to force compilation of requests.
Is it possible to cache this request?
{ Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog; }
No, Linq special requests are not cached, and you will incur the cost of creating a tree each time you call it.
Parameterized Queries
Most search capabilities are associated with highly parameterized queries. There are even libraries available that allow you to build a parameterized query from lamba expressions. The problem is that you cannot use precompiled queries with them. One way is to display all the possible query and flag criteria that you want to use:
public struct MyParams { public string name; public bool checkName; public int age; public bool checkAge; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where (myParams.checkAge == true && dog.Age == myParams.age) && (myParams.checkName == true && dog.Name == myParams.name ) select dog); protected List<Dog> GetSomeDogs() { MyParams myParams = new MyParams(); myParams.name = "Bud"; myParams.checkName = true; myParams.age = 0; myParams.checkAge = false; return query_GetDog(YourContext,myParams).ToList(); }
The advantage is that you get all the benefits of a pre-compiled quert. The disadvantages are that you are likely to end up with a where clause, which is rather difficult to maintain, that you will incur a higher fine for pre-compiling the request, and that each request you execute is not as efficient as it can be (in particular, with connections turned on) .
Another way is to build an EntitySQL query in parts, as we all did with SQL.
protected List<Dod> GetSomeDogs( string name, int age) { string query = "select value dog from Entities.DogSet where 1 = 1 "; if( !String.IsNullOrEmpty(name) ) query = query + " and dog.Name == @Name "; if( age > 0 ) query = query + " and dog.Age == @Age "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); if( !String.IsNullOrEmpty(name) ) oQuery.Parameters.Add( new ObjectParameter( "Name", name ) ); if( age > 0 ) oQuery.Parameters.Add( new ObjectParameter( "Age", age ) ); return oQuery.ToList(); }
Here are the problems: - during compilation there is no syntax check - each combination of parameters generates a different request, which must be precompiled at the first start. In this case, there are only 4 different possible queries (without parameters, only by age, only by name and both parameters), but you can see that there can be more with the usual world search. - Nobody likes to concatenate strings!
Another option is to query a large subset of the data, and then narrow it down in memory. This is especially useful if you work with a specific subset of data, like all dogs in a city. You know that there are many, but you also know that there are not many ... therefore, your CityDog search page can load all the dogs for the city into memory, which is one pre-compiled query, and then refines the results
protected List<Dod> GetSomeDogs( string name, int age, string city) { string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == @City "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); oQuery.Parameters.Add( new ObjectParameter( "City", city ) ); List<Dog> dogs = oQuery.ToList(); if( !String.IsNullOrEmpty(name) ) dogs = dogs.Where( it => it.Name == name ); if( age > 0 ) dogs = dogs.Where( it => it.Age == age ); return dogs; }
This is especially useful when you start displaying all the data and then enable filtering.
Problems: - It can lead to serious data transfer if you are not careful about your subset. - You can only filter the data that you returned. This means that if you do not return the Dog.Owner association, you cannot filter on Dog.Owner.Name. So, what is the best solution? No. You need to choose the solution that is best suited for you and your problem: - Use lambda-based query building when you are not interested in pre-compiling your queries. - Use a fully defined Linq precompiled query when your object structure is not too complex. - Use EntitySQL / string concatenation when the structure can be complex and when the possible number of different resulting queries is small (which means fewer pre-compilation attempts). - Use in-memory filtering when you are working with a small subset of data or when you had to first get all the data from the data in any case (if the performance is ok with all the data, then in-memory filtering will not cause any time spent on db).
Access to Singleton
The best way to handle your context and entities on all your pages is to use the singleton template:
public sealed class YourContext { private const string instanceKey = "On3GoModelKey"; YourContext(){} public static YourEntities Instance { get { HttpContext context = HttpContext.Current; if( context == null ) return Nested.instance; if (context.Items[instanceKey] == null) { On3GoEntities entity = new On3GoEntities(); context.Items[instanceKey] = entity; } return (YourEntities)context.Items[instanceKey]; } } class Nested {
NoTracking, is it worth it?
When executing the request, you can tell the system to track returned objects or not. What does it mean? With tracking enabled (the default option), the infrastructure will track what is happening with the object (has it been changed? Created? Deleted?) And will also link the objects together when additional queries are made from the database, which is of interest here.
For example, suppose a dog with identifier == 2 has an owner who has ID == 10.
Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == true;
If we did the same without tracking, the result would be different.
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog = oDogQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>) (from o in YourContext.PersonSet where o.ID == 10 select o); oPersonQuery.MergeOption = MergeOption.NoTracking; Owner owner = oPersonQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false;
Tracking is very useful and in an ideal world without performance problems, it will always be on. But in this world there is a price for it, in terms of performance. So should I use NoTracking to speed things up? It depends on what you are going to use the data for.
Is it likely that your query data using NoTracking can be used to create / insert / delete in the database? If so, do not use NoTracking because associations are not tracked and will throw exceptions.
On a page where there are no absolute database updates, you can use NoTracking.
Mix tracking and NoTracking is possible, but it requires special attention from you with updates / inserts / deletes. The problem is that if you mix, you run the risk of having a framework trying to connect () a NoTracking object to a context where another copy of the same tracking object exists. Basically, I say that
Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault(); ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog2 = oDogQuery.FirstOrDefault();
dog1 and dog2 are two different objects, one caterpillar and the other not. Using a separable object in the update / insert will force Attach (), which will say "Wait a minute, I already have the object here with the same database key. Fail". And when you attach () one object, its entire hierarchy is also attached, which causes problems all over the world. Be especially careful.
How much faster does it work with NoTracking
It depends on the requests. , . , .
, NoTracking?
Not really. . , , . YourEntities, , , . == YourEntity. . ( ).
, NoTracking ? , . / ? , , .
, ? NoTracking, , , :
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog); oDogQuery.MergeOption = MergeOption.NoTracking; List<Dog> dogs = oDogQuery.ToList(); ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o); oPersonQuery.MergeOption = MergeOption.NoTracking; List<Person> owners = oPersonQuery.ToList();
.Owner.
, , .
, ?
. , , . , db , . , . , "" EF.
, if (! ObjectReference.IsLoaded) ObjectReference.Load(); , , , , . .
, Dog
public class Dog { public Dog Get(int id) { return YourContext.DogSet.FirstOrDefault(it => it.ID == id ); } }
, . , Dog, . -, , . -, . , - FavoriteToy ..
, Load() , . . . , , Dog:
static public Dog Get(int id) { return GetDog(entity,"");} static public Dog Get(int id, string includePath) { string query = "select value o " + " from YourEntities.DogSet as o " +