Does the "ID" field add to database tables in accordance with the third normal form is considered an error?

Can I add the ID field as a primary key to all my database tables so that I can use it to establish relationships between the tables? Will this design be considered a 3NF design (third normal form)? If so, is this thing theoretically recommended or not?

+6
database database-design
source share
5 answers

The problem is that the question is a bit isolated. But since you are concerned about the question of whether the problem is theoretical (and possibly conforming to standards), the answer should not be isolated.

If so, is it theoretically recommended or not?

Not. It has no academic or theoretical foundation. It violates the basic rules for designing a relational database, and therefore (a) will not create a relational database and (b) no matter what is being produced, it will not have the possibility of relational capabilities that users (without having to go through the application) will expect through There are many simple relational database tools.

In fact, this, unfortunately, is a very common, quick and dirty way to create tables (which the application developer has defined for his application) fits into a database container such as MS SQL. Without performing any real work on creating a database or modeling, which is necessary for the contents of the container to qualify as a relational database. Good for obtaining a prototype or proof of concept, but not ready for any form of development (SQL coding).

Is it possible to add the Identifier field as a primary key to all my database tables and use it to establish relationships between the tables?

Wait. By definition, they cannot be "database tables." Database tables receive a formal modeling process and, as a result, will have strong identifiers. And the relationship is already defined. In this case, the question will not be asked. Therefore, since it is being asked, the things you ask for are nowhere close to "database tables." This is just one application developer program for one application.

Adding an FK constraint to one spreadsheet, linking it to another spreadsheet, and adding a PK identifier does not create a relational database. No, it just takes advantage of SQL's ability to bind other unrelated spreadsheets in the container. They remain unrelated spreadsheets, “linked” by the added “ID” column.

The result is a significant duplication of data; update anomalies; many more indices; Larger relational datasets low productivity; massive overuse of temporary tables; Complex SQL, all of which can be avoided thanks to the authentic design of the database.

Will this design be considered a 3NF design (third normal form)?

Normalization is part of the database development process (not all). 3NF comes through this process. 3NF, or any other NF, is not a label that can be placed on a set of spreadsheets or partially designed container contents without going through the process and thus getting an icon. One does not “view” a bunch of spreadsheets or partially engineered 3NF content; one evaluates whether normalization rules have been followed, and if the rules are not violated, then this is fairly labeled 3NF. Since the normalization process was not observed, there is no reason to believe that it can refer to any normal form.

In the same way, in addition to normalization, if during the process the rules of the relational database were respected and not violated, compliance with the standards of the relational database is achieved. Since the methodology of the relational database was not followed, there is no reason to believe that it can be associated with any standard of the relational database or that any relational capability can be expected from it.

Understanding the whole problem

"Identifiers" are surrogate keys. Surrogate keys are always (you're right) an additional key and an index that complements a pre-existing PC that should be usurped. Of course, this has significant performance with every access.

Some respondents have the idea that Surrogate keys can be used when replacing a PC. This, of course, is false, and you understand that it is so grateful that it does not need to be considered here.

The concept of "all surrogate keys" or "without surrogate keys" is black and white, all or nothing, stupidity, normal for children, but unacceptable for adult adults, especially those involved in IT work, which requires accuracy and understanding. A small child usually believes that "if dad does not allow me to do what I want, he does not love me," and therefore "if he does not love me, he hates me." Most of us understand that life is a little more complicated than at the age of elementary school. Developers who "love" to see "identifiers" on each table and "dislike" their absence on some tables are, by definition, unable to consider the database as a whole and the needs of other developers and users; they only think about simplified code with one table at a time.

It is also not about grayscale or blurry definitions. No, the definitions have not changed in 30 years (they have been expanded and refined, but they have not changed). Grayscale allows developers to avoid compliance and standards. So this is also not recommended.

What is a genuine relational database?

In truth, if the database were honestly modeled and designed by a qualified data designer, using methodologies that have been available for 30 years, they would end up in a truly relational database. And if they do not follow this process, it will be neither a Relational, nor a Database. Identifiers and relationships will already be defined, and the meaning, context, will be transferred to various tables. The data will be Normalized, 3NF or BCNF or 5NF, and there will be no update anomalies. At the last step, as part of the formal process, and not outside it, when translating the logical to physical, the moderator can improve the performance of some identifiers by adding Surrogate keys and avoiding the transfer of large (wide) keys to related child tables (1). This proves again, from a different approach, why the concept of zero surrogates or all surrogates is childish and completely excluded from the true process.

A genuine relational database will have full relational ability, honest 3NF achievement, use natural relational keys, with some of the few thoughtfully switching to surrogates.

Easily proven

Of course, everything I said can be easily proved: just write DDL from 5 to 10 of your spreadsheets, I need at least four “depths” (great.grand.parent⇢grand.parent⇢parent⇢child).

You may be interested, I recently posted information about your question in a related question , which I am not repeating here.

Note

  • This is only required because current SQL statements do not support the full relational model and remove the known performance barriers that they have. And there will be no need if and when suppliers will provide relational databases in which wide relational keys work, as well as narrow ones.

  • I agree with the keys and identifiers of Erwin operators, and therefore I did not repeat them in my answer.

+7
source share

"Can I add the ID field as a primary key to all my database tables to use to establish relationships between the tables?"

You clearly intend to add a surrogate identifier everywhere, blindly and without any thought. To think that everything is in order is as stupid as doing it. “Good” identifiers possess the properties of uniqueness (otherwise it would not be an identifier, obviously), stability (their values ​​rarely change) and familiarity (their values ​​mean something significant in the user world - a world outside the IT system).

Please note that I used the word "identifiers" instead of "keys" very deliberately. Keys have the property of uniqueness by definition. Therefore, all keys are a valid candidate to work as an identifier. Which key that you actually choose to act as an identifier should depend on how much or how little a particular key also meets the criteria of stability and familiarity.

Natural keys may not meet the stability criterion sufficiently (but the extent to which they operate is usually significantly overloaded, usually by developers who think too little about the user side and too much about their own side problem). A system identifier with absolute certainty violates the “familiarity” criterion.

These considerations should be sufficient to prove how the balance should go mainly when trading from one to the other.

"Will this design be considered a 3NF design (third normal form)? If so, is this thing theoretically recommended or not?"

If you add an identifier column to an existing project, this will not affect NF. No matter what NF was in your existing design, a design with an added ID will have the same NF.

+5
source share

Normal forms are related to dependencies between attributes. Not knowing what dependencies you plan to present in your table, we cannot say whether it satisfies any particular normal form.

If you are talking about a surrogate key (a key that does not make sense in a business domain), then for most purposes the important point is that such a key should not be the only key of any table. You should usually have a natural key (AKA business key) to ensure that data is not duplicated.

+2
source share

If you understood correctly,

Yes, adding a serial identifier to the table and including this identifier in the primary key to which you belong to the rows of this table is generally a good design question. Regardless of whether it violates 3NF: it does not violate 3NF, but it also does not guarantee it.

In practice, adding a serial identifier and using this value internally can have advantages. Firstly, you control the identifier, while the key generated from the outside can be suddenly changed by the other side. On the other hand, exporting the identifier to other parties "binds" this key, since changing it on your side may affect the use of this key by other users. Also, the serial number is often easy to fake and may interfere with other people using this number.

Also in practical design, 3NF or Boyce-Codd databases tend to be theoretical ideas that you aspire to, rather than blindly following. Selective denormalization is a well-known trick to speed up some queries by making the data closer.

0
source share

I totally agree with @jlouis that

3NF or Boyce-Codd tend to be theoretical ideas
From my practice, I can say that using a natural key is a good choice only in lookup lookup tables if the key field in the real world is unique and not null and does not change over time. In other cases, using a surrogate key is much preferable (from my point of view): this is just a more convenient way to develop tables, despite the fact that we are told by 3NF or Boyce-Codd.
-3
source share

All Articles