Postgresql UNION takes 10 times longer than running individual queries

I am trying to get the difference between two almost identical tables in postgresql. The current request that I run is:

SELECT * FROM tableA EXCEPT SELECT * FROM tableB; 

and

 SELECT * FROM tableB EXCEPT SELECT * FROM tableA; 

Each of the above requests takes about 2 minutes (large table)

I wanted to combine these two queries in the hope of saving time, so I tried:

 SELECT * FROM tableA EXCEPT SELECT * FROM tableB UNION SELECT * FROM tableB EXCEPT SELECT * FROM tableA; 

And while it works, it takes 20 minutes! I would suggest that it takes no more than 4 minutes, the amount of time for each request is individual.

Is there any kind of extra UNION work that makes it take so long? Or can I speed it up (with or without UNION)?

UPDATE: Running a query with UNION ALL takes 15 minutes, almost 4 times longer than running each of them. Am I right in saying that UNION (everyone) is not going to speed it up at all?

+7
sql diff postgresql union
source share
4 answers

Regarding your "extra work" question. Yes. The union not only combines the two requests, but also passes and deletes duplicates. This is the same as using a separate statement.

For this reason, especially when combined with your statements, “all union” is likely to be faster.

More details here: http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html

+11
source share

In addition to combining the results of the first and second queries, UNION also deletes duplicate entries by default. (see http://www.postgresql.org/docs/8.1/static/sql-select.html ). The extra work involved in checking for duplicate records between two queries is probably responsible for the extra time. In this situation, there should be no duplicate entries, therefore, additional work requiring duplication can be avoided by specifying UNION ALL .

 SELECT * FROM tableA EXCEPT SELECT * FROM tableB UNION ALL SELECT * FROM tableB EXCEPT SELECT * FROM tableA; 
+3
source share

I do not think that your code returns the results that you intend to use. I rather think you want to do this:

 SELECT * FROM ( SELECT * FROM tableA EXCEPT SELECT * FROM tableB ) AS T1 UNION SELECT * FROM ( SELECT * FROM tableB EXCEPT SELECT * FROM tableA ) AS T2; 

In other words, you need a set of mutually exclusive members. If so, you need to read about the relationship priority of relational operators in SQL;) And when you have it, you can understand that the above can be streamlined:

 SELECT * FROM tableA UNION SELECT * FROM tableB EXCEPT SELECT * FROM tableA INTERSECT SELECT * FROM tableB; 

FWIW, using subqueries (views T1 and T2 ) to explicitly show (which would be implicit) the priority of the relational operator, your original query is as follows:

 SELECT * FROM ( SELECT * FROM ( SELECT * FROM tableA EXCEPT SELECT * FROM tableB ) AS T2 UNION SELECT * FROM tableB ) AS T1 EXCEPT SELECT * FROM tableA; 

The above can be relational:

 SELECT * FROM tableB EXCEPT SELECT * FROM tableA; 

... and I think that is not the case.

+2
source share

You can use tableA FULL OUTER JOIN tableB, which will give you what you want (with the propre join condition) with only 1 table scan, likely to be faster than 2 queries above.

Submit additional information.

-2
source share

All Articles