Does EXCEPT faster than JOIN when the columns of the table are the same

To find all the changes between the two databases, I remain attached to the tables in pk and using the date_modified field to select the last record. Will use EXCEPT performance increase, since the tables have the same scheme. I would like to rewrite it with EXCEPT , but I'm not sure if the implementation for EXCEPT will execute the JOIN in each case. Hopefully someone has a more technical explanation of when to use EXCEPT .

+7
source share
2 answers

No one can tell you that EXCEPT always or never execute the equivalent of OUTER JOIN . The optimizer will choose the appropriate execution plan, regardless of how you write your intention.

So here is my recommendation:


Use EXCEPT when at least one of the following values ​​is executed:

  • The request is more readable (this will almost always be true).
  • Improved performance.

And BOTH of the following statements:

  • A query produces semantically identical results, and you can demonstrate this with sufficient regression testing, including all edge cases.
  • Performance does not deteriorate (again, in all extreme cases, as well as environmental changes, such as clearing the buffer pool, updating statistics, clearing the plan cache, and restarting the service).

It is important to note that there may be a problem writing an equivalent EXCEPT query, as the JOIN becomes more complex and / or you rely on duplicates in part columns, but not others. Writing the equivalent of NOT EXISTS , but slightly less readable than EXCEPT , should be much more trivial to execute - and often lead to a better plan (but note that I will never say ALWAYS or NEVER , except in the path that I just done).

In this blog post, I demonstrate at least one case where EXCEPT outperforms both the correctly constructed LEFT OUTER JOIN and, of course, the equivalent of NOT EXISTS option .

+13
source

In the following example, LEFT JOIN 70% faster than EXCEPT (PostgreSQL 9.4.3)

Example:

There are three tables. suppliers , parts , shipments . We need to get all the parts that are not supplied by any supplier in London.

Database (has indexes for all involved columns):

 CREATE TABLE suppliers ( id bigint primary key, city character varying NOT NULL ); CREATE TABLE parts ( id bigint primary key, name character varying NOT NULL, ); CREATE TABLE shipments ( id bigint primary key, supplier_id bigint NOT NULL, part_id bigint NOT NULL ); 

Number of records:

 db=# SELECT COUNT(*) FROM suppliers; count --------- 1281280 (1 row) db=# SELECT COUNT(*) FROM parts; count --------- 1280000 (1 row) db=# SELECT COUNT(*) FROM shipments; count --------- 1760161 (1 row) 

Request using EXCEPT .

 SELECT parts.* FROM parts EXCEPT SELECT parts.* FROM parts LEFT JOIN shipments ON (parts.id = shipments.part_id) LEFT JOIN suppliers ON (shipments.supplier_id = suppliers.id) WHERE suppliers.city = 'London' ; -- Execution time: 3327.728 ms 

A query using a LEFT JOIN with a table returned by a subquery.

 SELECT parts.* FROM parts LEFT JOIN ( SELECT parts.id FROM parts LEFT JOIN shipments ON (parts.id = shipments.part_id) LEFT JOIN suppliers ON (shipments.supplier_id = suppliers.id) WHERE suppliers.city = 'London' ) AS subquery_tbl ON (parts.id = subquery_tbl.id) WHERE subquery_tbl.id IS NULL ; -- Execution time: 1136.393 ms 
+2
source

All Articles