Why does SQL cost explode with a simple "or"?

I have the following statement to find unique names in my data (~ 1 million records):

select Prename, Surname from person p1 where Prename is not null and Surname is not null and not exists ( select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and inv_date IS NULL 

Oracle shows an enormous cost of 1,477,315,000, and execution does not end after 5 minutes. Simply dividing OR into its own existing subclause increases productivity up to 0.5 s and costs 45,000:

 select Prename, Surname from person p1 where Prename is not null and Surname is not null and not exists ( select * from person p2 where p1.Surname = p2.Surname and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and not exists ( select * from person p2 where p1.Surname = p2.Altname and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and inv_date IS NULL 

It is not my question to tune this for the best, since it is only a rarely executed request, and I know that CONTACT is superior to any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.

+7
source share
3 answers

Answer in PLAY EXPLAIN for your inquiries. They may be semantically equivalent, but the backstage execution plan for your queries is significantly different.

EXISTS works differently than JOIN, and essentially your OR filter statement is what joins tables.

There is no JOIN in the second query, since you only retrieve records from one table.

+6
source

The results of your two queries may be semantically equivalent, but execution is not the operational equivalent. Your second example never uses the OR operator to combine predicates. All your predicates in the second example are combined using AND.

Performance is better, because if the first predicate, which is combined with AND, does not evaluate to true, then the second (or any other predicate) is skipped (not evaluated). If you used OR, then both (or all) predicates would have to be evaluated often, which would slow down your query. (ORed predicates are checked until true is evaluated.)

+2
source

I would consider checking a query rewritten as shown below ... Make a direct connection from one to another according to the "Qualifies" criteria, what is considered a match ... Then, in the WHERE clause, throw it out if it does not return a match

 select p1.Prename, p1.Surname from person p1 join person p2 on p1.ID <> p2.ID and ( p1.Surname = p2.Surname or p1.SurName = p2.AltName ) and p2.PreName like concat( concat( '%', p1.Prename ), '%' ) where p1.PreName is not null and p1.SurName is not null and p1.Inv_date is null and p2.id is null 

In your comments, but from what you were apparently looking for ... NO DO NOT make a left outer join ... If you are looking for names that are ALIKE that you want to PICTURE (however, ll handle it) you only want to PRELIMINARY RECORD those records that SHOULD HAVE through self-connection (hence, normal connection). If you have a name that does not have a similar name, you probably want to leave it alone ... so it will be automatically left out of the result set.

Now the WHERE clause starts ... You have an active person on the left ... who has a person on the right .. They are duplicates ... so you have a match, now, throwing the logical “p2.ID IS NULL” creates one same result as NOT EXIST, giving final results.

I am returning my request to a normal "connection".

+1
source

All Articles