First you need to create a basic ENtity 6 structure: what is the best implementation for a basic object with 10 children

We have a base object with 10 children and an EF6 code.

Of these 10 child objects, 5 have only a few (additional) properties, and 5 have several properties (from 5 to 20). We implemented this as a table for one type, so we have one table for the database and 1 for each child (10 in total).

This, however, creates HUGE select queries with select case and unions everywhere, which also causes EF to generate 6 seconds (first time).

I read about this problem and that the same problem persists in a script like "table for specific".

So, what remains is table-per-hierachy, but this creates a table with a lot of properties, which is also not very good.

Is there any other solution for this?

I was thinking about maybe skipping the inheritance and creating a union view when I want to get all the elements from all the child objects / records.

Any other thoughts?

Thanks in advance.

+8
c # entity-framework entity-framework-6
source share
6 answers

Another solution would be to implement some kind of CQRS template , where you have separate databases for writing (command) and reading (query). You can even de-normalize the data in the database for reading, so it is very fast.

Assuming you need at least one normalized model with referential integrity, I think that your solution really comes down to a hierarchy table and a table for each type. TPH reported by Alex James of the EF team , and most recently on the Microsoft Data Development website , to have better performance.

Advantages of TPT and why they are not as important as performance:

Greater flexibility, which means the ability to add types without affecting any existing table. Not a big problem, because EF migrations make it trivial to create the necessary SQL to update existing databases without affecting the data.

Checking the database due to the fact that the number of fields with a zero value is less. Not much of a concern, since EF validates the data according to the application model. If data is added using other means, it is not recommended to run the background script to verify the data. In addition, TPT and TPC are actually worse for checking when it comes to primary keys, because two subclass tables can potentially contain the same primary key. You are left with the problem of checking in other ways.

The amount of memory is reduced due to the fact that you do not need to store all zero fields. This is just a trivial problem, especially if the DBMS has a good strategy for handling sparse columns.

Design and gut feeling. Having one very large table seems a bit wrong, but this is probably because most db developers spent many hours normalizing data and drawing ERDs. Having one large table seems to contradict the basic principles of database design. This is probably the biggest barrier to TPH. See this article for a particularly passionate argument .

This article summarizes the main argument against TPH as:

It is not normalized even in the trivial sense, it makes it impossible to ensure data integrity, as well as what is “surprising”: it is practically guaranteed that it works on a large scale for any nontrivial data set.

This is mostly wrong. Performance and integrity are mentioned above, and TPH does not necessarily mean denormalization. There are only many (null) foreign key columns that are self-referencing. Therefore, we can continue to develop and normalize the data in the same way as with TPH. In the current database, I have many relationships between subtypes and created an ERD as if it were a TPT inheritance structure. This actually reflects an implementation in the code-based Entity Framework. For example, here is my Expenditure class, which inherits from Relationship , which inherits from Content :

 public class Expenditure : Relationship { /// <summary> /// Inherits from Content: Id, Handle, Description, Parent (is context of expenditure and usually /// a Project) /// Inherits from Relationship: Source (the Principal), SourceId, Target (the Supplier), TargetId, /// /// </summary> [Required, InverseProperty("Expenditures"), ForeignKey("ProductId")] public Product Product { get; set; } public Guid ProductId { get; set; } public string Unit { get; set; } public double Qty { get; set; } public string Currency { get; set; } public double TotalCost { get; set; } } 

InversePropertyAttribute and ForeignKeyAttribute provide the EF information necessary to make the required self-join in a single database.

The product type is also mapped to the same table (also inheriting from Content). Each product has its own row in the table, and rows containing expenses will include data in the ProductId column, which is zero for rows containing all other types. Thus, the data is normalized , just placed in one table.

The beauty of using EF code in the first place is that we design the database in exactly the same way, and we implement it almost (almost) exactly the same, regardless of the use of TPH or TPT. To change the implementation from TPH to TPT, we just need to add annotation to each subclass, matching them with new tables. So, the good news is for you - it doesn’t matter which one you choose. Just create it, generate a stack of test data, test it, change the strategy, check it again. I believe that you will find TPH a winner.

+6
source share

I am having similar problems and I have a few suggestions. I am also open to improving these proposals, since this is a complex topic, and I did not succeed.

Entity structure can be very slow when working with non-trivial queries for complex objects, that is, with several levels of child collections. In some performance tests that I tried, it sits there for a very long time, compiling the request. In theory, EF 5 onwards should cache compiled requests (even if the context is deleted and re-created) without having to do anything, but I'm not sure if this is always the case.

I read a few suggestions that you need to create multiple DataContexts with smaller subsets of your database objects for a complex database. If practical, try it! But I assume that with this approach there will be maintenance problems.

1) I know that this is obvious, but it’s worth saying in any case - make sure that you have the appropriate foreign keys created in your database for related objects, since then the entity structure will track these relationships and will generate queries much faster , you need to join the foreign key.

2) Do not extract more than you need. One size is suitable for all methods of obtaining a complex object, it is rarely optimal. Suppose you get a list of base objects (to be placed in a list), and you only need to display the name and identifier of these objects in the list of base objects. Just get only the base object - no navigation properties that are not specifically needed should be retrieved.

3) If the child objects are not collections, or they are collections, but you only need one element (or cumulative value, such as an account), I would absolutely implement the view in the database and the query instead. It is much faster. EF does not need to do any work - all this is done in a database that is better suited for this type of operation.

4) Be careful with .Include () and this will return to point 2 above. If you get a single object + property of a child collection, it is best not to use .Include (), as when a child collection is retrieved, this will be executed as a separate request. (so as not to get all the columns of the base object for each row in the child collection)

EDIT

The following comments here provide some further thoughts.

Since we are dealing with an inheritance hierarchy, the logical sense is to store separate tables for additional properties of the inheriting classes + table for the base class. As for how to get the Entity Framework to work well, although it is still under discussion.

I used EF for a similar scenario (but fewer children), (database first), but in this case I did not use the actual classes of generated Entity objects as business objects. EF objects are directly related to database tables.

I created separate business classes for the base and inheriting classes and a set of Mappers that will convert to them. The query will look something like this:

 public static List<BaseClass> GetAllItems() { using (var db = new MyDbEntities()) { var q1 = db.InheritedClass1.Include("BaseClass").ToList() .ConvertAll(x => (BaseClass)InheritedClass1Mapper.MapFromContext(x)); var q2 = db.InheritedClass2.Include("BaseClass").ToList() .ConvertAll(x => (BaseClass)InheritedClass2Mapper.MapFromContext(x)); return q1.Union(q2).ToList(); } } 

Not to say this is the best approach, but could it be a starting point? Requests will certainly compile quickly in this case!

Comments are welcome!

+4
source share

With a table for a hierarchy, you get only one table, so obviously your CRUD operations will be faster, and this table will in any case be abstracted by your domain level. The disadvantage is that you lose the ability to NOT NULL constraints, so this must be handled correctly using the business layer to avoid potential data integrity. In addition, adding or deleting objects means that the table is being modified; but this is also what is controllable.

With the table for type , you have a problem that the more classes in your hierarchy, the slower the CRUD operations will execute.

In general, since performance is probably the most important aspect here, and you have many classes, I believe that the hierarchy table is a winner both in terms of performance and simplicity, and considering your number of classes.

Also see this article , in particular in chapter 7.1.1 (“Avoid TPT in Model First or Code First Applications”), where they indicate: “when creating an application using the First or Code First model, you should avoid the TPT inheritance for problems performance.

+3
source share

Model EF6 CodeFirst I'm working on using generics and abstract base classes called "BaseEntity". I also use generics and a base class for the EntityTypeConfiguration class.

In case I need to reuse several properties of the "columns" in some tables, and it makes no sense for them to be on BaseEntity or BaseEntityWithMetaData, I create an interface for them.

eg. I have one for addresses that I have not finished yet. Therefore, if the object has address information, it will implement IAddressInfo. Bringing an object to IAddressInfo will give me an object with its address information.

Initially, I had metadata columns as their own tables. But, as others said, the requests were horrific, and it was slower than slow. So I thought: why don’t I just use several inheritance paths to support what I want to do so that the columns are in every table that they need, and not on those that don’t. In addition, I use mysql, which has a column limit of 4096. Sql Server 2008 has 1024. Even in 1024, I do not see realistic scripts to go to a single table.

And it’s not my objects that inherit so that they have columns that they don’t need. When such a need arises, I create a new base class at the level to prevent additional columns.

There are enough snippets from my code to understand how I have the inheritance setting. So far, this works very well for me. I did not create a script that I could not model with this setting.

 public BaseEntityConfig<T> : EntityTypeConfiguration<T> where T : BaseEntity<T>, new() { } public BaseEntity<T> where T : BaseEntity<T>, new() { //shared properties here } public BaseEntityMetaDataConfig : BaseEntityConfig<T> where T: BaseEntityWithMetaData<T>, new() { public BaseEntityWithMetaDataConfig() { this.HasOptional(e => e.RecCreatedBy).WithMany().HasForeignKey(p => p.RecCreatedByUserId); this.HasOptional(e => e.RecLastModifiedBy).WithMany().HasForeignKey(p => p.RecLastModifiedByUserId); } } public BaseEntityMetaData<T> : BaseEntity<T> where T: BaseEntityWithMetaData<T>, new() { #region Entity Properties public DateTime? DateRecCreated { get; set; } public DateTime? DateRecModified { get; set; } public long? RecCreatedByUserId { get; set; } public virtual User RecCreatedBy { get; set; } public virtual User RecLastModifiedBy { get; set; } public long? RecLastModifiedByUserId { get; set; } public DateTime? RecDateDeleted { get; set; } #endregion } public PersonConfig() { this.ToTable("people"); this.HasKey(e => e.PersonId); this.HasOptional(e => e.User).WithRequired(p => p.Person).WillCascadeOnDelete(true); this.HasOptional(p => p.Employee).WithRequired(p => p.Person).WillCascadeOnDelete(true); this.HasMany(e => e.EmailAddresses).WithRequired(p => p.Person).WillCascadeOnDelete(true); this.Property(e => e.FirstName).IsRequired().HasMaxLength(128); this.Property(e => e.MiddleName).IsOptional().HasMaxLength(128); this.Property(e => e.LastName).IsRequired().HasMaxLength(128); } } //I Have to use this pattern to allow other classes to inherit from person, they have to inherit from BasePeron<T> public class Person : BasePerson<Person> { //Just a dummy class to expose BasePerson as it is. } public class BasePerson<T> : BaseEntityWithMetaData<T> where T: BasePerson<T>, new() { #region Entity Properties public long PersonId { get; set; } public virtual User User { get; set; } public string FirstName { get; set; } public string MiddleName { get; set; } public string LastName { get; set; } public virtual Employee Employee { get; set; } public virtual ICollection<PersonEmail> EmailAddresses { get; set; } #endregion #region Entity Helper Properties [NotMapped] public PersonEmail PrimaryPersonalEmail { get { PersonEmail ret = null; if (this.EmailAddresses != null) ret = (from e in this.EmailAddresses where e.EmailAddressType == EmailAddressType.Personal_Primary select e).FirstOrDefault(); return ret; } } [NotMapped] public PersonEmail PrimaryWorkEmail { get { PersonEmail ret = null; if (this.EmailAddresses != null) ret = (from e in this.EmailAddresses where e.EmailAddressType == EmailAddressType.Work_Primary select e).FirstOrDefault(); return ret; } } private string _DefaultEmailAddress = null; [NotMapped] public string DefaultEmailAddress { get { if (string.IsNullOrEmpty(_DefaultEmailAddress)) { PersonEmail personalEmail = this.PrimaryPersonalEmail; if (personalEmail != null && !string.IsNullOrEmpty(personalEmail.EmailAddress)) _DefaultEmailAddress = personalEmail.EmailAddress; else { PersonEmail workEmail = this.PrimaryWorkEmail; if (workEmail != null && !string.IsNullOrEmpty(workEmail.EmailAddress)) _DefaultEmailAddress = workEmail.EmailAddress; } } return _DefaultEmailAddress; } } #endregion #region Constructor static BasePerson() { } public BasePerson() { this.User = null; this.EmailAddresses = new HashSet<PersonEmail>(); } public BasePerson(string firstName, string lastName) { this.FirstName = firstName; this.LastName = lastName; } #endregion } 

Now the code in the context of ModelCreating looks like

  //Config modelBuilder.Conventions.Remove<PluralizingTableNameConvention>(); //initialize configuration, each line is responsible for telling entity framework how to create relation ships between the different tables in the database. //Such as Table Names, Foreign Key Contraints, Unique Contraints, all relations etc. modelBuilder.Configurations.Add(new PersonConfig()); modelBuilder.Configurations.Add(new PersonEmailConfig()); modelBuilder.Configurations.Add(new UserConfig()); modelBuilder.Configurations.Add(new LoginSessionConfig()); modelBuilder.Configurations.Add(new AccountConfig()); modelBuilder.Configurations.Add(new EmployeeConfig()); modelBuilder.Configurations.Add(new ContactConfig()); modelBuilder.Configurations.Add(new ConfigEntryCategoryConfig()); modelBuilder.Configurations.Add(new ConfigEntryConfig()); modelBuilder.Configurations.Add(new SecurityQuestionConfig()); modelBuilder.Configurations.Add(new SecurityQuestionAnswerConfig()); 

The reason I created the base classes for configuring my entities was because when I started this way, I ran into an annoying problem. I had to set common properties for each tested class again and again. And if I updated one of the smooth API mappings, I had to update the code in each class class.

But, using this inheritance method in configuration classes, two properties are configured in one place and inherited by the configuration class for derrived objects.

So, when PeopleConfig is configured, it runs the logic in the BaseEntityWithMetaData class to configure two properties, and again when UserConfig is started, etc. etc. etc.

+2
source share

Three different approaches have different names in the language of M. Fowler:

  • Single Table inheritance - an entire inheritance hierarchy stored in a single table. There are no connections, optional columns for child types. You need to determine what type of child it is.

  • Concrete Table inheritance - you have a table for each specific type. Joins, optional columns. In this case, the base type table is needed only if the base type needs to have its own mapping (an instance can be created).

  • Class Table inheritance - you have a base type table and child tables - each adds only additional columns to the base columns. Joins, optional columns. In this case, the base type table always contains a row for each child; however, you can only extract common columns if you don’t need columns specific to a particular child (can lazy loading be used to relax?).

All approaches are workable - it depends only on the quantity and structure of the data that you have, so you can first measure the difference in performance.

The selection will be based on the number of joins and data distribution compared to optional columns.

  • If you do not (and will not) have many child types, I would go with the inheritance of the class table, since it is close to the domain and it will be easy to translate / display.
  • If you have many child tables that you can work with at the same time, and expect a bottleneck in joins, use unidirectional table inheritance.
  • If joins are not needed at all, and you will work with one specific type at a time, go with the inheritance of a specific table.
+1
source share

Although a table for hierarchy (TPH) is the best approach for fast CRUD operations, in this case it is impossible to avoid a single table with so many properties for the created database. The case and union arguments you created are created because the result query effectively queries the polymorphic result set, which includes several types.

However, when EF returns a smoothed table that includes data for all types, it does additional work to ensure that null values ​​are returned for columns that may not be relevant for a particular type. Technically, this additional check using case and merge is not needed. The next problem is a performance failure in Microsoft EF6, and they are aimed at delivering this fix in a future version.

The following query:

  SELECT [Extent1].[CustomerId] AS [CustomerId], [Extent1].[Name] AS [Name], [Extent1].[Address] AS [Address], [Extent1].[City] AS [City], CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT(([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL ASvarchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL)) THEN[UnionAll1].[State] END AS [C2], CASE WHEN (( NOT (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))) AND ( NOT(([UnionAll1].[C4] = 1) AND ([UnionAll1].[C4] IS NOT NULL)))) THEN CAST(NULL ASvarchar(1)) WHEN (([UnionAll1].[C3] = 1) AND ([UnionAll1].[C3] IS NOT NULL))THEN[UnionAll1].[Zip] END AS [C3], FROM [dbo].[Customers] AS [Extent1] 

can be safely replaced with:

 SELECT [Extent1].[CustomerId] AS [CustomerId], [Extent1].[Name] AS [Name], [Extent1].[Address] AS [Address], [Extent1].[City] AS [City], [UnionAll1].[State] AS [C2], [UnionAll1].[Zip] AS [C3], FROM [dbo].[Customers] AS [Extent1] 

, Entity Framework 6, " ", TPH.

+1
source share

All Articles