Composite primary keys versus unique object identifier field

I inherited a database built with the idea that composite keys are much more ideal than using the unique field of an object identifier, and that when creating a database, one unique identifier should never be used as a primary key. Since I was creating the Rails interface for this database, I ran into difficulties making it consistent with the Rails conventions (although using custom views and a few additional stones to handle composite keys, you could use non-standard ones).

The rationale for this particular design of the schema from the person who wrote it was about how the database processes identifier fields inefficiently, and when it creates indexes, tree species have drawbacks. This explanation had no depth, and I'm still trying to bow my head around the concept (I am familiar with using composite keys, but not in 100% of cases).

Can anyone suggest opinions or add more depth to this topic?

+66
database design-patterns ruby-on-rails database-design
01 Oct '08 at 18:40
source share
15 answers

Most commonly used engines (MS SQL Server, Oracle, DB2, MySQL, etc.) will not experience noticeable problems using a surrogate key system. Some may even experience increased productivity from using a surrogate, but performance issues are very platform specific.

In general terms, the natural key (and, in addition, the complex key) verses of the surrogate key debate has a long history without a visible “right answer”.

Arguments for natural keys (singular or compound) usually include the following:

1) They are already available in the data model. Most objects that are modeled already include one or more attributes or attribute combinations that meet the needs of the key in order to create relationships. Adding an additional attribute to each table involves unnecessary redundancy.

2) They eliminate the need for certain associations. For example, if you have customers with customer codes and invoices with account numbers (both of which are “natural”), and you want to retrieve all the account numbers for a specific customer code, you can simply use "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'" . In a classic surrogate key approach, SQL would look something like this: "SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'" .

3) They contribute to a more universally applicable approach to data modeling. With natural keys, the same design can be used largely unchanged between different SQL engines. Many surrogate key approaches use specific SQL engine methods to generate keys, which requires more specialization of the data model for implementation on different platforms.

Surrogate key arguments tend to revolve around SQL-specific issues:

1) They make it easier to change attributes when changing business requirements / rules. This is because they allow you to isolate data attributes to a single table. This is primarily a problem for SQL engines that inefficiently implement standard SQL constructs such as DOMAIN. When an attribute is defined by the DOMAIN statement, changes to the attribute can be made in the general schema using the ALTER DOMAIN statement. Different SQL engines have different performance characteristics for changing a domain, and some SQL modules do not implement DOMAINS at all, so data modeling compensates for these situations by adding surrogate keys to improve the ability to make changes to attributes.

2) They make it easier to implement concurrency than natural keys. . In the case of the natural key, if two users simultaneously work with the same information set, such as a client line and one user changes the value of the natural key, then updating the second user will fail, because the client code that they update is no longer exists in the database. In the surrogate key case, the update will be successfully processed, because immutable identifier values ​​are used to identify rows in the database, and not mutable client codes. However, it is not always advisable to allow the second update - if the client code has changed, it is possible that the second user is not allowed to continue changing them, because the actual "identifier" of the line has changed - the second user may update the wrong line. Neither surrogate keys nor natural keys alone solve this problem. End-to-end concurrency solutions need to be addressed outside of the key implementation.

3) They work better than natural ones. Performance is most dependent on the SQL engine. The same database schema, implemented on the same hardware using different SQL mechanisms, often has dramatically different performance characteristics, thanks to the mechanisms for storing and retrieving SQL data. Some SQL modules come close to flat file systems, where data is actually stored redundantly when the same attribute, such as client code, is displayed in several places in the database schema. This redundant storage using the SQL engine can cause performance problems when you need to make changes to the data or schema. Other SQL engines provide a better separation between the data model and the storage / retrieval system, which allows for faster data and schema changes.

4) Surrogate keys work better with certain data access libraries and graphical interfaces. Due to the homogeneity of most surrogate key constructs (for example, all relational keys are integers), data access libraries, ORMs, and graphical interfaces can work with information without requiring special knowledge of the data. Natural keys due to their heterogeneity (different data types, size, etc.) Do not work also with automated or semi-automatic tools and libraries. For specialized scenarios, such as embedded SQL databases, creating a database with a specific set of tools may be appropriate. In other scenarios, databases are enterprise information resources that are simultaneously accessible by several platforms, applications, reporting systems, and devices and therefore do not work when they are concentrated on a particular library or structure. In addition, databases designed to work with specific tools become responsibility when the next great toolkit is introduced.

I tend to fall on the side of natural keys (obviously), but I'm not fanatical. Due to the environment in which I work, where any particular database that I help with development can be used by various applications, I use natural keys for most data modeling models and rarely make surrogates. However, I'm not going to try my best to repurpose existing databases that use surrogates. Surrogate systems work very well - no need to change something that already works well.

There are several excellent resources that discuss the merits of each approach:

http://www.google.com/search?q=natural+key+surrogate+key

http://www.agiledata.org/essays/keys.html

http://www.informationweek.com/news/software/bi/201806814

+82
01 Oct '08 at
source share

I have been developing database applications for 15 years, and I have yet to meet a case where a key without a surrogate was a better choice than a surrogate key.

I’m not saying that such a case does not exist, I’m just saying, when you take into account the practical problems of developing an application that accesses the database, usually the advantages of a surrogate key begin to suppress the theoretical purity of the keys without a surrogate.

+30
01 Oct '08 at 19:05
source share

the primary key must be constant and meaningless; non-surrogate keys usually endure one or both requirements, ultimately

  • If the key is not permanent, you have a problem with a future update, which can become quite complicated.

  • if the key is not meaningless, then it is more likely to change, i.e. will not be permanent; see above

take a simple, general example: an inventory item table. It may be tempting to make the item number (sku number, barcode, part code or something else) the primary key, but then after a year all the item numbers change, and you are left with a very dirty update, database problem ...

EDIT: There is an additional problem, more practical than philosophical. In many cases, you will somehow find a specific line, then refresh it later or find it again (or both). With composite keys, there is more data to track and more inconsistencies in the WHERE clause for repeated searches or updates (or deletes). It is also possible that one of the key segments may change during this time !. There is always only one value left with a surrogate key (surrogate identity) and by definition it cannot change, which greatly simplifies the situation.

+19
01 Oct '08 at 19:16
source share

It sounds like the person who created the database is on the side of the natural keys to the great natural keys against surrogate discus.

I have never heard of any problems with btrees in ID fields, but I also have not studied it at any great depth ...

I fall on the surrogate side of the key: you have fewer repetitions when using the surrogate key, because you repeat only one value in other tables. Since people rarely join tables by hand, we don’t care if it’s a number or not. In addition, since there is only one fixed-size column in the column for the index, it is safe to assume that surrogates have faster primary key lookup times.

+11
Oct 01 '08 at 18:53
source share

Using unique (object) ID fields makes it easy to combine, but you should strive to keep the other (possibly composite) key unique - DO NOT relax non-zero constraints and DO NOT maintain a unique constraint.

If the DBMS cannot efficiently process unique integers, this has big problems. However, using a "unique" (object) ID "and another key uses more space (for indexes) than just another key, and has two indexes to update in each insert operation. So this is not a freebie - but as long as you keep the original key, you’ll be fine too. If you remove the other key, you disrupt the design of your system; all hell will eventually break out (and you may or may not notice that hell has broken).

+5
Oct 01 '08 at 18:46
source share

I am mainly a member of the surrogate key team, and even if I appreciate and understand arguments such as those presented here by JeremyDWill, I'm still looking for a case where the “natural” key is better than the surrogate ...

Other posts on this issue typically relate to relational database theory and database performance. Another interesting argument, always forgotten in this case, is related to table normalization and code product :

every time i create a table i have to lose time

  • indicating its primary key and its physical characteristics (type, size)
  • remember these characteristics every time I want to access it in my code?
  • explanation of the choice of PK to other developers in the team?

My answer does not match all of these questions:

  • I do not have time to lose trying to determine the "best primary key" when with a list of persons.
  • I don’t want to remember that the primary key of my “ computer ” table is a string of 64 characters (does Windows accept many characters for the computer name?).
  • I don’t want to explain my choice to other developers, where one of them finally says “Yes, man, but consider that you need to manage computers in different domains? Does a string of 64 characters allow you to save the domain name + computer name?”.

So, I have been working for the last five years with a very simple rule: each table (let it be called " myTable ") has its first field called " id_MyTable ", which has a unique identifier type. Even if this table maintains a many-to-many relationship, for example the ComputerUser table, where the combination of id_Computer and id_User forms a very acceptable primary key, I prefer to create this id_ComputerUser field as a unique identifier, just stick to the rule.

The main advantage is that you do not have to worry about using the primary key and / or foreign key in your code. When you have a table name, you know the name and type of PK. Once you know what references are implemented in your data model, you will find out the name of the available foreign keys in the table.

I'm not sure my rule is the best. But it is very effective!

+4
Oct 02 '08 at 20:12
source share

Using natural keys makes it a nightmare using any automatic ORM as a save layer. In addition, foreign keys in several columns tend to overlap, and this will have an additional problem when navigating and updating relationships using OO.

However, you can convert the natural key to a unique constraint and add an automatically generated identifier; this does not fix the problem with foreign keys, however they will need to be manually changed; I hope that multiple columns and overlapping constraints will be the smallest of all relationships, so you can focus on refactoring where it matters the most.

Natural pk have their own motivation and use scenario and are not bad (tm), they just can't get along well with ORM.

I feel that, like any other concepts, natural keys and table normalization should be used when reasonable, and not as blind design constraints

+3
01 Oct. '08 at 18:55
source share

I will be short and sweet: composite primary keys these days are not very good. Add arbitrary keys to surrogate if you can and maintain existing key schemes with unique constraints. ORM is glad you are happy, the original programmer is not very happy, but if he is not your boss, he can handle it.

+3
01 Oct '08 at 19:29
source share

... how the database processes identification fields in an inefficient way and when it builds indexes, tree species are erroneous ...

This was almost certainly nonsense, but perhaps due to the problem of index block competition when assigning incremental PK numbers at high speed from different sessions. If so, then the REVERSE KEY index should help, albeit at the expense of a larger index due to a change in the block separation algorithm. http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/schema.htm#sthref998

Switch to synthetic, especially if it helps the faster development of your toolbox.

+2
Oct 01 '08 at 20:09
source share

A practical approach to developing a new architecture is a method that uses surrogate keys for tables that will contain thousands of multi-column unique records and compound keys for short descriptive tables. I usually find that colleges dictate the use of surrogate keys, while real-world programmers prefer compound keys. You really need to apply the correct type of primary key to the table - not only one way or another.

+2
Oct 21 '08 at 20:33
source share

I'm not experienced, but still I'm in favor of using the primary key as an identifier here is an explanation using an example.

The format of external data may change over time. For example, you might think that ISBN books will make a good primary key in a book table. In the end, ISBNs are unique. But as this particular book is written, the publishing industry in the United States is gearing up for major changes as additional numbers are added to all ISBNs. If wed used ISBN as the primary key in the book table, wed should update each row to reflect this change. But then I had another problem. There may be other tables in the database that link rows in the book table using the primary key. We cannot change the key in the book table unless we first review and update all of these links. And this will be associated with the elimination of foreign key restrictions, updating tables, updating the table of books and, finally, restoring restrictions. In general, this is something of a pain. The problems go away if we use our own internal value as the primary key. No third party can come and arbitrarily tell us to change our scheme - we control our own key space. And if something like ISBN really needs to be changed, it can change without affecting any existing relationships in the database. In fact, weve separated the knitting of rows from the external representation of the data in those rows.

Although the explanation is pretty bookish, but I think it explains things in a simpler way.

+2
Apr 21 2018-10-10T00:
source share

Composite keys can be good - they can affect performance - but this is not the only answer, almost the same as the only (surrogate) key is not the only answer.

I am interested in the uncertainty in the discussion of the choice of composite keys. Most often, the uncertainty about something technical indicates a misunderstanding - perhaps, in accordance with other recommendations, in a book or article ....

ID-infact, , , , , , - .

, . , .

, - , , .

+1
01 . '08 19:22
source share

@JeremyDWill

, . , DOMAIN s.

, . - - , //, , .

+1
02 . '08 8:42
source share

- , , , , , ORM .

, , - "RowID" - GUID . - ( ). ORM, , RowID .

So you can:

CREATE TABLE dbo.Invoice (
  CustomerId varchar(10),
  CustomerOrderNo varchar(10),
  InvoiceAmount money not null,
  Comments nvarchar(4000),
  RowId uniqueidentifier not null default(newid()),

  primary key(CustomerId, CustomerOrderNo)
 )

, , ORM , !

+1
24 . '08 15:03
source share

- , - , ( ), , .

, , , ( - ), - .

, , 64- , , -, , , -, .

- ... , 32- , , , --, , 2,147,483,648- ( ).

- , . - ​​ , , , .

0
08 . '14 15:52
source share



All Articles