Is it better to introduce more logic into the ON clause or is it just the minimum necessary?

Question

Is it better to introduce more logic into the ON clause or is it just the minimum necessary?

Given these two queries:

Select t1.id, t2.companyName from table1 t1 INNER JOIN table2 t2 on t2.id = t1.fkId WHERE t2.aField <> 'C'

OR:

 Select t1.id, t2.companyName from table1 t1 INNER JOIN table2 t2 on t2.id = t1.fkId and t2.aField <> 'C'

Is there an obvious difference between the two? It seems to me that the sentence "t2.aField <> 'C'" will be executed on every line in t2, which will meet the connection criteria independently. Am I wrong?

Update: I did "Enable Actual Execution Plan" in SQL Server. The two queries were identical.

+7

sql sql-server

jcollum Apr 30 '09 at 21:00

source share

6 answers

There is a difference. You should do a PLAY EXPLAIN for both options and see it in detail.

As for the simpler explanation: The WHERE clause is executed only after joining the two tables, so it is executed for each row returned from the join, and not from each table from table2.

Performance best eliminates unwanted results at an early stage, so there should be fewer lines for joins where additional operations or other operations will continue to operate.

In the second example, there are two columns that should be the same for the rows that should be joined together, so usually they will give different results than the first.

+3

Azder Apr 30 '09 at 21:14

source share

It depends.

 SELECT t1.foo, t2.bar FROM table1 t1 LEFT JOIN table2 t2 ON t1.SomeId = t2.SomeId WHERE t2.SomeValue IS NULL

differs from

 SELECT t1.foo, t2.bar FROM table1 t1 LEFT JOIN table2 t2 ON t1.SomeId = t2.SomeId AND t2.SomeValue IS NULL

This is different because the first erases all entries from t2 that are NULL at t2.SomeValue and those from t1 that are not referenced in t2. The latter only deletes t2 entries that are NULL in t2.SomeValue.

Just use the ON clause for the join condition and the WHERE clause for the filter.

+2

Tomalak Apr 30 '09 at 21:47

source share

If moving the join condition to the where clause changes the value of the query (for example, in the example of the left join above), then it doesn't matter where you put them. SQL will reorder them, and as long as they are provably equivalent, you will get the same query.

Speaking, I think this is more of a logical / readable thing. I usually put everything that joins two tables in a join, and everything that filters in where.

+1

John gibb Apr 30 '09 at 10:51

source share

I would prefer the first request. The SQL server will use the best connection type for your query based on your indexes, after which the WHERE clause will be applied. But you can simultaneously run both queries, view execution plans, compare and select the fastest (optimize the addition of indexes).

0

Irina c Apr 30 '09 at 21:16

source share

if you are not working on a single-user application or something similar small, which creates a trivial load, the only considerations that mean anything are how the server will process your request.

Answers that mention query plans give good advice.

Also, set io stats to get an idea of how much your request will be read (I especially like writing to Azder).

Think of each database server as a data pump from disk to client. This pump runs faster if it only performs the IO necessary to complete the job. If the data is in the cache, it will be even faster. But you do not want to read more than you need from the disk - this will lead to crowding out useful data from your cache for no good reason.

0

yetanotherdave Apr 30 '09 at 21:58

source share

Bravax · Accepted Answer · 2009-04-30T21:13:12+0000

I prefer to use Join criteria to explain how tables are combined. Therefore, I would place an additional offer in the where section.

I hope (although I don't have statistics) that SQL Server will be smart enough to find the optimal query plan, regardless of the syntax you use.

HOWEVER, if you have indexes that also have id and aField in them, I would suggest placing them in the internal join criteria.

It would be interesting to see the query plan in these 2 (or 3) scenarios and see what happens. Good question.

Is it better to introduce more logic into the ON clause or is it just the minimum necessary?

More articles: