When should you use a composite primary key?

ETA: My question is based on maintaining an optimal database. What is the difference in database performance / size between having the entire composite primary key for ProjectUserBooleanAttribute , which supposedly has indexes for PUAT_Id and UserID and a non-composition table using PK auto-increment, but with indexes for PUAT_Id and UserID ? From further reading, it seems that if I went with a non-composite approach, I would have to create a unique index in these two columns. Do I need to create indexes on these two columns? If so, does that not mean that each column in this table has its own index?

Is this the quintessential dilemma of database size (indexes) and performance?


So, I have the following objects that I want to create

  • Projects
  • Users
  • ProjectUserAttributeTypes

In this simplified example, all my ProjectUserAttributeTypes will be boolean , so I am showing the ProjectUserBooleanAttribute table.

Let's say I want to create two ProjectUserAttributeType boolean objects called Silver and Gold . I would just create two lines in ProjectUserAttributeTypes . Now, if I want to designate the user as having this attribute, I would add a line to ProjectUserBooleanAttribute

The database administrator warned me about using composite primary keys in general for performance reasons. However, in this case, I do not see what I get, NOT using composites. In both cases, I need to make sure that ProjectUserBooleanAttribute has non-empty and unique values ​​for all columns. I also definitely need indexes.

NOTE. . My ultimate goal is to query my database and find all users that have certain combinations of attributes. I would join the tables to filter by project, and then use where the conditions for filtering are even greater. A few examples:

  • (GOLD OR SILVER)
  • (GOLD XOR SILVER)
  • ((GOLD OR SILVER) AND NOT (BRONZE))

COMPOSITE PK

Compsite erd

AGAINST A COMPLETE PC

No Composite ERD

+5
source share
2 answers

There are two main projects for relational databases:

  • natural keys
  • identifiers

With natural keys, you use the provided keys: the project is identified by its project number, the user by name or login number, etc. This often leads to compound (or compound) keys:

  • project ( project_no , name, ...)
  • user ( username , first_name, last_mname, ...)
  • project_user ( project_no, username , role, ...)

The project_user table has a composite key: the project number and username, which uniquely identify the record, letting us know who is working on which project.

With identifiers, you usually add a technical identifier that is used only to refer to records and does not matter to the user:

  • project ( project_id , project_no, name, ...)
  • user ( user_id , username, username, last_mname, ...)
  • project_user ( project_user_id , project_id, user_id, role, ...)

Tables contain the same fields plus identifiers, and you need the same unique, but not null restrictions, as with natural keys plus restrictions on identifiers.

Project_user_id in project_user is required, of course, only when there is a table that needs this link. But often each table gets an identifier, whether it is needed or not, just so that everyone looks the same (and therefore identifiers already exist, if they are needed later).

At first glance, it seems that an identifier-based database is just more work, more indexes and nothing works, but it’s not. The concept of ID is often chosen because it gives more freedom. Example: what happens if the project number changes? Using natural keys, the project number is in many tables and must be updated cascade somehow, which can become quite a challenge. In the ID database, you simply change the project ID in one place.

And the hat will happen if suddenly the project numbers are unique only within the company? In an ID-based database, you must add company_id to the project table, add a unique index to company_id and project_no and do with it. Using natural keys, the company number (ILN? Artificial number?) Must be added to the primary key and must be entered in all child tables. So: when you create a database with natural keys, you have to think about all this to get stable natural keys, and sometimes they are not, and then you need to invent them. With identifiers, you don't care if the fields can change or not. Thus, an ID-based database is easier to implement.

Then there is a hierarchy. Suppose you have several companies in your database, each with their own goods, their own warehouses.

Natural keys:

  • company ( company_code , name, ...)
  • item ( company_code, item_no , name, ...)
  • warehouse ( company_code, warehouse_no , address, ...)
  • stocks ( company_code, warehouse_no, item_no , quantity, ...)

identifiers:

  • company ( company_id , name, ...)
  • item ( item_id , item_no, name, company_id, ...)
  • warehouse ( store_id , address, company_id, ...)
  • stock ( stock_id , store_id, item_id, quantity, ...)

With the concept of ID, you do not need to specify company_id in the stock table again, because it is known from the parent table. It would even be redundant to store it, whereas in the concept of a natural key, this is necessary because it is part of a composite key, and without it we will lose touch with its parent tables. Some people consider this purity a great advantage of the identifier concept over natural keys. However, there is a drawback. The natural keys database guarantees that the company’s positions are in the company's warehouses, as the company is part of the share table key. With the ID concept, the associated warehouse entry may belong to company 1 and the related item of company 2. The inconsistent data caused by the incorrect insertion, which the DBMS could not prevent. With natural keys, such an error cannot occur.

And if I want to know how many shares the company has, I just choose natural keys from stocks. But I will need to select from stock plus another table to get the company in the identifier database.

With a large hierarchy, you can receive queries with many, many other tables related to the ID database. So far, I have not seen an ID-based database outperform a natural-key database. But I have seen that key-based databases outperform identities based on them. Perhaps this is due to the fact that basically I saw large databases with a large hierarchy.

As for your database: it looks like it's based on ID, if the project ID and user ID are only technical internal numbers - otherwise your database will be a mixed concept (natural project number, natural user identifier, technical identifier for ProjectUserBooleanAttribute). So your question is not related to compound keys or not.

PUAT_ID and UserID must be in ProjectUserBooleanAttribute, they will not be NULL, and you must have a unique constraint (unique index). Thus, they have all the qualities necessary for a primary key, regardless of whether you call this “primary key” or not. Do you add a technical identifier just for his views. It does not change anything. The concept remains the same.

In the concept of a natural key, you must make the fields the primary key. But then you would not have PUAT_Id, but some kind of composite key here (ProjectId plus AttributeType?).

In the concept of a technical identifier, you do not make it the primary key, but you make the fields non-zero and add a unique constraint (which makes it the key, only it is not called the "primary" one). Then either add the technical identifier as the primary key, or the table without the identifier, and thus without the primary key. It does not matter. If someone asks for a key, give them an identification card, if not, you can do without it. This is superfluous if it is not required by any other table.

+13
source

When you add an id column to a table, it inevitably adds overhead to manage that table. But the advantage is that other tables can now refer to the rows of this table on the same id column instead of the old composite key. This can reduce storage and indexes and access these tables faster. It can also make links to the corresponding objects shorter (only one column) and more obvious (by type id).

Note that in tables that now use this added id as FK, if you keep any of the old composite FK columns with the identifier, then you must have a restriction so that the values ​​for these columns are the same as the values ​​for of these columns are the columns in the row that the identifier refers to.

0
source

All Articles