SQL joins SQL subqueries (performance)?

I want to know if I have a join request something like this -

Select E.Id,E.Name from Employee E join Dept D on E.DeptId=D.Id 

and a subquery something like this -

 Select E.Id,E.Name from Employee Where DeptId in (Select Id from Dept) 

When I review performance , which of the two queries will be faster and why?

Also, is there a time when I should prefer one over the other?

Sorry if this is too trivial and asked earlier, but I'm confused. Also, it would be great if you guys can offer me tools that I should use to measure the performance of two queries. Thank you very much!

+97
performance sql join sql-server-2008 subquery
04 Oct '10 at 14:25
source share
9 answers

I would WIN the first request faster, mainly because you have equivalence and explicit JOIN. In my experience, IN is a very slow statement, because SQL usually evaluates it as a series of WHERE clauses separated by the character "OR" ( WHERE x=Y OR x=Z OR... ).

As with ALL THINGS SQL, your mileage may vary. The speed will be highly dependent on the indexes (do you have indexes on both columns of the identifier? This will help a lot ...) by the way.

The only REAL way to talk with 100% certainty, which is faster, is to enable performance tracking (especially useful IO statistics) and run both of them. Remember to clear the cache between runs!

+43
04 Oct 2018-10-14
source share

Well, I think this is the question "Old, but Golden." Answer: "It depends!" Performances are such a delicate topic that it would be too stupid to say: "Never use subqueries, always join." In the following links you will find some basic recommendations that I thought were very useful: Here 1 Here 2 Here 3

I have a table with 50,000 elements, as a result I was looking for 739 elements.

At first, my request was as follows:

 SELECT p.id, p.fixedId, p.azienda_id, p.categoria_id, p.linea, p.tipo, p.nome FROM prodotto p WHERE p.azienda_id = 2699 AND p.anno = ( SELECT MAX(p2.anno) FROM prodotto p2 WHERE p2.fixedId = p.fixedId ) 

and it took 7.9s to complete.

Finally, my request:

 SELECT p.id, p.fixedId, p.azienda_id, p.categoria_id, p.linea, p.tipo, p.nome FROM prodotto p WHERE p.azienda_id = 2699 AND (p.fixedId, p.anno) IN ( SELECT p2.fixedId, MAX(p2.anno) FROM prodotto p2 WHERE p.azienda_id = p2.azienda_id GROUP BY p2.fixedId ) 

and it took 0.0256s

Good SQL, good.

+32
Jul 05 '13 at
source share

Start looking at the execution plans to see the differences in how the SQl server will interpret them. You can also use Profiler to actually run queries multiple times and get the difference.

I would not expect that they will be so terribly different, and you can get real profit. Big performance gains when using joins instead of subqueries are when you use correlated subqueries.

EXISTS is often better than either of these two, and when you talk about the left join, where you want all entries not to be in the left join table, NOT EXISTS is often a much better choice.

+10
04 Oct 2018-10-10
source share

Performance is based on the amount of data you perform on ...

If less than 20K. JOIN works better.

If the data is more like 100k +, then IN works better.

If you do not need data from another table, IN is good, but it is better to look for EXISTS.

All of these criteria that I tested and the tables have corresponding indexes.

+8
Jun 28 2018-12-12T00:
source share

Two queries may not be semantically equivalent. If an employee works in more than one department (perhaps the company I work for, admittedly, this will mean that your table is not fully normalized), then the first query will return duplicate rows, while the second query will not. To make queries equivalent in this case, the DISTINCT keyword must be added to the SELECT , which can affect performance.

Note that there is a design rule that states that a table should model an entity / class or relationships between objects / classes, but not both. Therefore, I suggest you create a third table, say OrgChart , to model the relationship between employees and departments.

+4
Sep 09 '11 at 9:51
source share

Performance should be the same; it is much more important to have the correct indexes and clustering applied to your tables (there are some good resources on this topic).

(Edited with updated question)

+3
04 Oct '10 at 2:30 p.m.
source share

You can use an explanatory plan to get an objective answer.

For your problem, the Exists filter is likely to perform the fastest operation.

0
04 Oct 2018-10-14
source share

The final request included azienda_id in the processed subquery, but your initial request did not include azienda_id in the completed subquery. So the comparison is not the same.

-one
Jul 11 '13 at 17:06
source share

I tested the HLGEM theory, comparing the numbers of "customer usage statistics", it turns out that it does not exist faster than the left join when searching for all records not in the left table.

The beauty of SQL is the many ways to write it, and the performance depends not only on the join or subquery, but also on what you are looking for.

-one
Mar 19 '15 at 3:47
source share



All Articles