How do you know when an SQL database needs more normalization?

This is when you are trying to get data, and there is no obvious easy way to do this?

When will you find that something should be a table on it?

What are the laws?

+6
sql database sql-server database-design normalization
source share
13 answers

Check out Wikipedia . The article discusses the normalization of the database and various forms (first, second, third, etc.). In most cases, you should strive for at least a third normal form . There are times when you want to relax a little in the rules (it may be too expensive to combine several tables together, so you may need to de-normalize the bits), but for the most part the third normal form is good.

+7
source share

When you notice that you need to repeat the same data or when you start using separate fields as arrays.

+6
source share

Although this is a pretty bad answer when you find that the data is not normalized enough. There are many resources on the Internet about levels (or, more correctly, "forms") of normalization, and they describe forms in more detail than I could here. The first and second normal forms should be largely necessary. If you are not in the third (or, indeed, the fourth) normal form, you need to have a reasonable justification for why.

Check out the Wikipedia article on database normalization .

+3
source share

When you start to doubt the need for more normalization of the SQL database.

+2
source share

Whenever you have a relational database .... <grin/>

No, actually there are laws, see this link.

they are called the five normal forms or something like that. Originally from the guy who invented relational databases in the 50s / 60s, E. F. Codd.

"The key is the whole key and nothing but the key, so help me Codd"

This is a brief overview:

  • The table of the first normal form (1NF) faithfully represents the relationship and does not have duplicate groups.
  • Second Normal Form (2NF) No the non-prime attribute in the table is functionally dependent on the part (the correct subset) of the candidate key
  • The third normal form (3NF) attribute is non-prime independent of each key table. Each nontrivial functional dependence in the table depends on the supercluster
  • The fourth normal form (4NF) is a nontrivial multi-valued dependence in the table; there is a dependence on a superkey
  • Fifth Normal Form (5NF). Each nontrivial connection dependency in a table is implied by table super-keys. Domain / Key Normal Form (DKNF) Ronald Fagin (1981) [19] Each constraint in a table is a logical consequence of tablespace constraints and key constraints
  • Sixteenth normal form (6NF) non-trivial dependencies of the connection in general (with reference to the generalized connection operator)
+2
source share

Other people have pointed out formal formalization rules for you. Here are some unofficial recommendations I use:

  • If you have columns in the table whose names differ only in number (for example, Phone1 and PHone2).

  • If you have any columns in the table that should be filled only when filling in another column in the table.

  • If updating a "fact" in the database (for example, street address) requires more than one UPDATE.

  • If the same question can get two different answers, depending on which table you get your information from.

  • If the answer to any non-trivial question can be obtained from the database without connecting at least two tables.

  • If you have any quantity limits in the database except “only one address is allowed” (that is, “only one address is allowed”, everything is fine, but “only two addresses are allowed” indicates the normalization problem).

+2
source share

3NF is usually all you need, and it complies with three rules:

Each column in the table should depend on:

  • key (1NF),
  • all key (2NF),
  • and nothing but a key (3NF) (so help me Codd is the way that usually ends).

You can often downgrade to 2NF for performance reasons, if you understand the consequences and only when confronted with problems, but 3NF should be the initial goal for all your projects.

+1
source share

Like everyone else, you know when you start to have (too many) duplicate columns in multiple tables.

It is sometimes useful to have redundant columns for multiple tables. This can reduce the number of JOINs that you must perform in complex queries. Just be careful to sync all tables, or you just ask for problems.

+1
source share

This is a pretty good article. Normalization is a science, not an art. Now, knowing when DEnormalize ... is art.

http://www.alvechurchdata.co.uk/hints-and-tips/softnorm.html

0
source share
0
source share

What level of normalization are you currently using? If you cannot answer, I believe your database is an unpleasant mess. I always got into the third normal source project and de-normalize or normalize further if and when necessary.

0
source share

I assume you are talking about a transactional database that supports an interactive application, but what it costs ...

OLAP databases used solely for reporting and only updated using ETL processes can benefit from a less normalized structure. In these applications, you take the cost of redundant storage and duplication of data to increase productivity due to fewer connections and greater ease of use (sometimes less technical) of data analysts and business analysts.

Transactional databases should always be normalized to the degree of practicality (at least 3NF), and then selectively denormalized only when necessary. And the need for denormalization should ideally be based on actual performance testing results.

0
source share

When you need to search huge volumes of data to extract some basic information - for example, which categories of products exist or something like that.

-one
source share

All Articles