The two tables that I request from both have ~ 150 million rows.
The following statement, which I finish after it does not return within 45 minutes, so I do not know how long it will work:
select * from Cats cat where not exists( select dog.foo,dog.bar from Dogs dog where cat.foo = dog.foo and cat.bar = dog.bar);
however, this request runs after about 3 minutes:
select * from Cats outside where not exists(select * from Cats cat where exists( select dog.foo,dog.bar from Dogs dog where cat.foo = dog.foo and cat.bar = dog.bar)));
My question is what is going on behind the scenes, what I see is that this is an increase in productivity?
Reasoning for the return of the same set of results:
The first query (slow) state indicates all elements that do not exist based on the Cats table.
The second query (fast) contains all the elements that do not exist from the subset of Cats that exist.
I expect the following request:
select dog.foo,dog.bar from Dogs dog where cat.foo = dog.foo and cat.bar = dog.bar
to return [A, B, C]
This is common to both functions.
My cat table has the following: [A, B, C, D, E]
I expect the following request:
select * from Cats cat where exists
to return [A, B, C] and the last fragment:
select * from Cats outside where not exists
to return [D, E]
UPDATE
Set the notation to mathematically prove my claims (please correct me if I used the wrong characters):
β Cat (Ζ cat β Ζdog)
For all elements in Cat, return a collection containing each cat element that is not equal to the element in dog
β Cat (Ζ cat = Ζdog)
For all elements in Cat, return a collection containing each cat element that is equal to the element in dog
β Cat (Ζ innerCat β Ζcat)
For all elements in Cat, return a set containing each element of the inner cat that is not equal to the element in cat
Second update
I see that my math did not match my SQL.