Recommendations for creating a data model

For the current project, I am creating a data model. Are there any sources where I can find “best practices” for a good data model? Good means flexibility, efficiency, excellent performance, style ... Some examples of questions are “column naming”, “what data should be normalized” or “what attributes should be exported to your own table”. The source must be a book :-)

+7
source share
3 answers

Personally, I think you should read a performance tuning book before starting to model the database. Proper design can change the world. If you are not an expert in performance tuning, you do not have the right to create a database.

These books are database specific, here is one for SQl Server. http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282& SR = 1-1

Another book you should read before you start designing is antipatterns. It is always good to know what you should avoid. http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622& SR = 1-1

Do not get trapped in design for flexibility. People use this as a way to avoid the job of developing correct and flexible databases that almost always perform poorly. If more than 5% of your database design depends on flexibility, you, in my opinion, did not model it correctly. All of the worst COTS products I've had to work with were first designed to provide flexibility.

Any decent database book will discuss normalization. You can also easily find this information on the Internet. Be sure to create FK / PK relationships.

As for the column designation, select a standard and stick to it consistently. Consistency is more important than the actual standard. Do not specify a column identifier (see SQL Antipattern Book). Use the same names and data types if the columns will be in several different tables. What you are going to do is not use functions to make connections due to data type mismatches.

Always remember that databases can (and will) change outside the application. All that is needed for data integrity should be in the database, not in the application code. Data will be there long after replacing the application.

The most important thing for database design:

  • Careful determination of the required data (including the correct data types) and the relationship between the pieces of data (including the correct normalization)
  • data integrity
  • performance
  • Security
  • consistency (data types, naming standards, etc.).
+8
source

The best book I read about developing database systems was Introduction to Database Systems. Joe Selco SQL for Smarties books is also worth reading. Assuming you're building an application, not just a database, and assuming you're using an object-oriented language, using UML and templates by Craig Larman has a good discussion on mapping databases to objects.

In terms of defining “good,” in my experience, “supported” is probably at the top of the list. Maintaining performance in database design means a lot of things, such as sticking to conventions - I often recommend http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/ . Normalization is another obvious support strategy. I often recommend being generous with column types - it is difficult to change the application if you find that the postal codes in different countries are longer than in the USA. I often recommend using views for abstract complex data relationships for less experienced developers.

A key feature of maintainability is the ability to test and deploy. It is worth reading about the continuous integration of databases (http://www.codeproject.com/KB/architecture/Database_CI.aspx), although it is not strictly related to the design of the database schema, this is an important context.

As for performance - I believe that you first need to design for service, and only design for performance, if you know that you have a problem. Sometimes you know in advance that performance will be a serious problem - creating a database for Facebook (or Stack Exchange), creating a database with a huge amount of data (terabytes and above), or a huge number of users. Most systems do not fall into this camp - therefore, I recommend regular performance tests with representative data to find if you have a problem and only tune when you can prove it. Many performance optimizations due to maintainability - denormalization, for example.

Oh, and generally, avoid triggers and stored procedures if you can. This is just my opinion, though ...

+2
source

Even if this is not a book, I recommend reading query estimation methods for large databases . It provides information about query processing, which greatly affects the design of your circuit, especially for intensive (e.g., analytic) workloads. This is less practical, but I believe that every database developer should read it at least once :-).

+1
source

All Articles