Which of the two ways to encode an inner join is faster?

I prefer to code in t-sql using what is actually an inline join rather than having a long list of joins at the end of the stored procedure or view.

For example, I have the code:

SELECT PKey , Billable, (SELECT LastName FROM Contact.dbo.Contacts WHERE (Pkey = Contacts_PKey)), (SELECT Description FROM Common.dbo.LMain WHERE (PKey= DType)), (SELECT TaskName FROM Common.dbo.LTask WHERE (PKey = TaskType)) , StartTime, EndTime, SavedTime FROM dbo.TopicLog where StartTime > '7/9/09' ORDER BY StartTime 

Instead

 SELECT t.PKey, t.Billable, c.LastName, m.Description, lt.TaskName, t.StartTime, t.EndTime, t.SavedTime FROM dbo.TopicLog AS t inner join Contact.dbo.Contacts as c on c.Pkey = t.Contacts_PKey and t.StartTime > '7/9/09' inner join Common.dbo.LMain as m on m.PKey = t.DType inner join Common.dbo.LTask as lt on lt.PKey = t.TaskType ORDER BY t.StartTime 

I prefer this type of syntax because it is much less confusing when writing or debugging, especially when there are many tables to be joined or something else happens (arguments to case, t-sql, self join, etc.)

But my question is that I get a performance hit by accessing the database this way.

I do not have enough data collected to measure the difference, but I will be at some point along the way.

I would like to know before proceeding. I would not want to come back later and redo everything to improve performance.

+6
join sql-server tsql sql-server-2005
source share
8 answers

The second (actual internal join), as a rule. The first (subqueries) executes 3 queries for each row, but this is usually controlled by the compiler to mitigate the differences.

Best: Check out query plans for yourself!

Since you get slow performance, I assume your tables are not indexed properly. You must have clustered indexes on all of your primary keys and nonclustered indexes on foreign keys (the ones you use to create joins).

I should note that these two queries are equivalent if and only if you have the appropriate values ​​in all your join conditions (i.e. always return all rows from the main table). Otherwise, you will get null from the subquery if there is no match. Internal joins actively filter out any rows that do not match join conditions. The subquery is actually equivalent (in results, not speed or execution) for the left outer join.

+20
source share

The first method is not an inner join at all, it is a correlated subquery. And they are more like left outer joins than inner joins, since they return NULL when there is no corresponding value.

+10
source share

The first looks like a pathological way to join me. I would avoid this if for some other reason it is unusual - an experienced SQL DBA who looks at it to support it will spend time looking for the reason why it is encoded in such a way when there is no real reason how much you want so that the request is executed. This behaves more like an external connection if data is missing.

The second example looks fine.

You should know that the way to create internal unions in the old school is as follows:

 SELECT t.PKey, t.Billable, c.LastName, m.Description, lt.TaskName, t.StartTime, t.EndTime, t.SavedTime FROM dbo.TopicLog as t, Contact.dbo.Contacts as c, Common.dbo.LMain as m, Common.dbo.LTask as lt WHERE c.Pkey = t.Contacts_PKey and t.StartTime > '7/9/09' AND m.PKey = t.DType AND lt.PKey = t.TaskType ORDER BY t.StartTime 

And guessing, this is equivalent to the syntax of the modern "internal table of connection to the field" after its analysis.

As another answer says, if you are looking for faster queries, the first thing to do is check that the table indices are sorted. Then review the query execution plan.

+3
source share

Two queries in OP say very different things and give only the same results if there are correct assumptions of the data model:

  • Each of the columns used in the search has no null constraints and foreign key constraints.

  • The primary key or unique key of the lookup table is used.

It may be in the specific case of OP, these assumptions are true, but in the general case they are different.

As others have already pointed out, a sub-query is more like an external join in which it will return zero for the LastName, Description and Task Name columns, rather than filtering out the row completely.

In addition, if one of the subqueries returns more than one row, you will receive an error message.

Regarding personal preferences, I prefer the second example with join syntax, but this is subjective.

+1
source share

Generally speaking, vs joins performance of simple subqueries without a difference - a common misconception that subqueries are much slower (since the SQL server must loop the internal query), but generally speaking, this is simply not true! During the compilation process, the SQL server creates an execution tree, and often in these trees the subqueries are equivalent to joins.

It is worth noting that your two queries are not logically the same and give different results for me, the second query should really read something in the lines: (this is still not identical, but closer)

 SELECT t.PKey, t.Billable, c.LastName, m.Description, lt.TaskName, t.StartTime, t.EndTime, t.SavedTime FROM dbo.TopicLog AS t LEFT OUTER JOIN Contact.dbo.Contacts as c on c.Pkey = t.Contacts_PKey LEFT OUTER JOIN Common.dbo.LMain as m on m.PKey = t.DType LEFT OUTER JOIN Common.dbo.LTask as lt on lt.PKey = t.TaskType WHERE t.StartTime > '7/9/09' ORDER BY t.StartTime 

In my testing, a sub-query compiled a run-time plan with significantly fewer reads (15 as opposed to 1000), but slightly higher cpu β€” on average, the runtime was approximately equivalent.

However, it is worth noting that this is usually not always (especially when evaluating functions inside a subquery) and sometimes , you may encounter problems due to the subquery. In general, however, it is best to worry about such cases only when faced with performance issues.

+1
source share

In general, subqueries (i.e. the first example) are slower, but the easiest way to optimize and analyze your queries is to try them through your specific database, the MS SQL server provides excellent analysis and performance tuning tools.

0
source share

Many SQL programmers are completely unaware that the optimizer often allows subqueries to join. There is probably no reason for performance issues in any query.

See the execution plan!

0
source share

I think the second one is faster. The reason for this is to use an alias (t, c, m, etc. In your example). The name relational engine can easily find a pointer to the location of the table.

I think this is one of the tips in sql tunning.

0
source share

All Articles