Primary Key / Cluster Key for Pivot Tables

Say we have a product table, an order table, and a connection table. ProductOrder.

ProductOrder will have ProductID and OrderID.
On most of our systems, these tables also contain an autonumber column with an identifier.

What is the best practice for hosting a primary key (and therefore a clustered key)?

  • Should I store the primary key of the ID field and create a non-clustered index for the foreign key pair (ProductID and OrderID).

  • Or should I put the primary key of the foreign key pair (ProductID and OrderID) and put the non-clustered index in the identifier column (if necessary)

  • Or ... (clever remark of one of you :))

+7
sql clustered-index junction-table
source share
4 answers

I know these words can make you cringe, but "it depends."

Most likely, you want the order to be based on the ProductID and / or OrderId, and not on the autonumber column (surrogate), since the autonumber number has no natural value in your database. You probably want to order the join table in the same field as the parent table.

  • First understand why and how you use the surrogate key identifier in the first place; which will often determine how you index it. I will assume that you are using a surrogate key because you are using some which works well with single-column keys. If there is no specific reason for the design, then for the connection table I would simplify the problems and just delete the autodial identifier if it does not bring any other benefits. The primary key becomes (ProductID, OrderID). If not, you need to at least make sure that your index on (ProductID, OrderID) is unique to maintain data integrity.

  • Cluster indexes are good for sequential scanning / combining when a Query needs results in the same order as the index. So, look at your access patterns, find out which key (s) you will use to perform sequential, multi-row fetching / scanning and with which you will do random, individual access to the row and create a clustered index on the key that you scan the most , and a non-clustered key index by key, which you will use for random access. You must select one or the other, since you cannot group both.

NOTE. If you have conflicting requirements, there is a way ("trick") that can help. If all the columns in the query are found in the index, then this index is a candidate table for the database engine that will be used to satisfy the query requirements. You can use this fact to store data in more than one order, even if they conflict with each other. Just remember the pros and cons of adding additional fields to the index and make an informed decision after understanding the nature and frequency of the requests that will be processed.

+5
source share

The correct and only answer:

  • Primary Key ('orderid' , 'productid')
  • Another index on ('productid' , 'orderid')
  • Or you can cluster, but PK is the default

Because:

  • You do not need an index only for orderid or productid : the optimizer will use one of the indexes
  • Most likely, you will use the table "both" ways
  • You do not need a surrogate key, because you already have them on linked tables. Thus, the 3rd columns lose space.
+3
source share

This is similar to a dynamic system in which many orders will be added. Therefore, the clustered index should be in your autostart column.

You can make the index a primary key and put another unique index in a pair of columns. Or you can make a couple of columns primary (but non-clustered).

The choice of using a primary key or a unique index key is up to you. But I would make sure the one that is clustered is for your autonumber column.

+1
source share

My preference has always been to create an autonomy for primary keys. Then I create a unique index for the two foreign keys so that they are not duplicated.

The reason I do this is because the more I normalize my data, the more keys I have to use in joins. I ended up with ideas going from six to seven levels, and if I use keys flowing from one level to another, I can potentially end up with n ^ 2 keys in the connection.

Try to convince my SQL developers to use all of this for a single query, and I really like them.

I make it simple.

0
source share

All Articles