Another solution would be to implement some kind of CQRS template , where you have separate databases for writing (command) and reading (query). You can even de-normalize the data in the database for reading, so it is very fast.
Assuming you need at least one normalized model with referential integrity, I think that your solution really comes down to a hierarchy table and a table for each type. TPH reported by Alex James of the EF team , and most recently on the Microsoft Data Development website , to have better performance.
Advantages of TPT and why they are not as important as performance:
Greater flexibility, which means the ability to add types without affecting any existing table. Not a big problem, because EF migrations make it trivial to create the necessary SQL to update existing databases without affecting the data.
Checking the database due to the fact that the number of fields with a zero value is less. Not much of a concern, since EF validates the data according to the application model. If data is added using other means, it is not recommended to run the background script to verify the data. In addition, TPT and TPC are actually worse for checking when it comes to primary keys, because two subclass tables can potentially contain the same primary key. You are left with the problem of checking in other ways.
The amount of memory is reduced due to the fact that you do not need to store all zero fields. This is just a trivial problem, especially if the DBMS has a good strategy for handling sparse columns.
Design and gut feeling. Having one very large table seems a bit wrong, but this is probably because most db developers spent many hours normalizing data and drawing ERDs. Having one large table seems to contradict the basic principles of database design. This is probably the biggest barrier to TPH. See this article for a particularly passionate argument .
This article summarizes the main argument against TPH as:
It is not normalized even in the trivial sense, it makes it impossible to ensure data integrity, as well as what is “surprising”: it is practically guaranteed that it works on a large scale for any nontrivial data set.
This is mostly wrong. Performance and integrity are mentioned above, and TPH does not necessarily mean denormalization. There are only many (null) foreign key columns that are self-referencing. Therefore, we can continue to develop and normalize the data in the same way as with TPH. In the current database, I have many relationships between subtypes and created an ERD as if it were a TPT inheritance structure. This actually reflects an implementation in the code-based Entity Framework. For example, here is my Expenditure class, which inherits from Relationship , which inherits from Content :
public class Expenditure : Relationship {
InversePropertyAttribute and ForeignKeyAttribute provide the EF information necessary to make the required self-join in a single database.
The product type is also mapped to the same table (also inheriting from Content). Each product has its own row in the table, and rows containing expenses will include data in the ProductId column, which is zero for rows containing all other types. Thus, the data is normalized , just placed in one table.
The beauty of using EF code in the first place is that we design the database in exactly the same way, and we implement it almost (almost) exactly the same, regardless of the use of TPH or TPT. To change the implementation from TPH to TPT, we just need to add annotation to each subclass, matching them with new tables. So, the good news is for you - it doesn’t matter which one you choose. Just create it, generate a stack of test data, test it, change the strategy, check it again. I believe that you will find TPH a winner.