I have the following statement to find unique names in my data (~ 1 million records):
select Prename, Surname from person p1 where Prename is not null and Surname is not null and not exists ( select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and inv_date IS NULL
Oracle shows an enormous cost of 1,477,315,000, and execution does not end after 5 minutes. Simply dividing OR into its own existing subclause increases productivity up to 0.5 s and costs 45,000:
select Prename, Surname from person p1 where Prename is not null and Surname is not null and not exists ( select * from person p2 where p1.Surname = p2.Surname and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and not exists ( select * from person p2 where p1.Surname = p2.Altname and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and inv_date IS NULL
It is not my question to tune this for the best, since it is only a rarely executed request, and I know that CONTACT is superior to any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.
stracktracer
source share