Do I need a primary key for my table that has UNIQUE (composite 4 columns), one of which may be NULL?

I have the following table (PostgreSQL 8.3), which stores the prices of some products. Prices are synchronized with another database, basically most of the fields below (except for one) are not updated by our client - but instead, they were reset and updated every once in a while to synchronize with another database:

CREATE TABLE product_pricebands ( template_sku varchar(20) NOT NULL, colourid integer REFERENCES colour (colourid) ON DELETE CASCADE, currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE, siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE, master_price numeric(10,2), my_custom_field boolean, UNIQUE (template_sku, siteid, currencyid, colourid) ); 

During synchronization, I basically delete most of the data above, except the WHERE data my_custom_field has the value TRUE (if it is TRUE, this means that the client updated this field through its CMS, and therefore this record should not be discarded). Then I insert INSERT from 100 to 1000 rows into the table and UPDATE where INSERT fails (i.e. where the combination already exists (template_sku, siteid, currencyid, colourid)).

My question is: what is the best practice to apply here to create a primary key? Do I need a primary key? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid), but the colourid field can be NULL, and it is impossible to use it in a composite primary key.

From what I read on other forum posts, I think I did it right, and I just need to clarify:

1) Should I use a "serial" primary key in case I ever need it? At the moment, I do not do this and do not think that I will ever be, because the important data in the table are the price and my custom field, which is determined only by a combination (template_sku, siteid, currencyid, colourid).

2) Since (template_sku, siteid, currencyid, colourid) is a combination that I will use to request the price of a product, should I add additional indexing to my columns, for example, "template_sku", which is varchar? Or is a UNIQUE constraint a good index for my SELECT?

+8
null indexing postgresql database-design primary-key
source share
1 answer

Should I use a "serial" primary key in case I ever need to?

You can easily add a sequential column later if you need it:

 ALTER TABLE product_pricebands ADD COLUMN id serial; 

The column will be filled with unique values ​​automatically. You can even make it the primary key in the same expression (if the primary key is not already defined):

 ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY; 

If you are referencing a table from other tables, I would recommend using such a surrogate primary key , because it is rather cumbersome with four columns. It is also slower in SELECT with JOINs.

In any case, you must define the primary key . The UNIQUE index, including a nullable column, is not a complete replacement. It allows you to duplicate combinations, including the NULL value, because two NULL values ​​are never considered the same. This can lead to trouble.


how

colourid field may be NULL

You can create two unique indexes . The combination (template_sku, siteid, currencyid, colourid) cannot be PRIMARY KEY due to the zero colourid value, but you can create a UNIQUE constraint as you already have (using the index automatically):

 ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx UNIQUE (template_sku, siteid, currencyid, colourid) 

This index perfectly covers the questions you mentioned in 2).
Create a partially unique index if you want to avoid "duplicates" using (colourid IS NULL) :

 CREATE UNIQUE INDEX product_pricebands_uni_null_idx ON product_pricebands (template_sku, siteid, currencyid) WHERE colourid IS NULL; 

To cover all bases. I wrote more about this technique in a related answer on dba.SE.


A simple alternative to the above is to make the colourid NOT NOT NULL and create a primary key instead of the above product_pricebands_uni_idx .


Also, since you

basically DELETE most of the data

for your replenishment operation, it will be faster to discard indexes that are not needed during the replenishment operation, and subsequently recreate them. It is faster by an order to create an index from scratch than to add all rows step by step.

How do you know which indexes are used (needed)?

  • Test your queries with EXPLAIN ANALYZE .
  • Or use the built-in statistics . pgAdmin displays statistics on a separate tab for the selected object.

It may also be faster to select multiple rows with my_custom_field = TRUE in the temporary table, TRUNCATE base table and re-DELETE the survivors. Depends on whether you have foreign keys. It will look like this:

 CREATE TEMP TABLE pr_tmp AS SELECT * FROM product_pricebands WHERE my_custom_field; TRUNCATE product_pricebands; INSERT INTO product_pricebands SELECT * FROM pr_tmp; 

This avoids a lot of evacuation.

+10
source share

All Articles