What makes the SQL query optimizer decide between a nested loop and a hash join

In general, what makes the SQL query optimizer decide between a nested loop and a hash join.

+7
performance sql
source share
2 answers

NESTED LOOPS are good if the condition inside the loop is valid, that is, the index can be used to limit the number of records.

For a query like:

 SELECT * FROM a JOIN b ON b.b1 = a.a1 WHERE a.a2 = @myvar 

at the start of a each entry from a will be accepted, and all relevant entries in b should be found.

If b.b1 indexed and has high power, then NESTED LOOP will be the preferred way.

In SQL Server , this is also the only way to execute non equijoins (something other than = condition in the ON clause)

HASH JOIN is the fastest method if all (or almost all) records should be parsed.

It takes all the entries from b , builds a hash table on top of them, then takes all the entries from a and uses the value of the join column as a key to find the hash table.

  • NESTED LOOPS takes this time:

    Na * (Nb / C) * R ,

    where Na and Nb are the number of entries in a and b , C is the index power, and R is the constant time required to search for the string ( 1 is all fields in the SELECT , WHERE and ORDER BY sections SELECT covered by the index, about 10 if they are not are)

  • HASH JOIN takes this time:

    Na + (Nb * H)

    where H is the sum of the constants necessary for constructing and searching the hash table (for writing). They are programmed into the engine.

SQL Server calculates power using table statistics, calculates and compares two values ​​and selects the best plan.

+16
source share

As a rule, it will depend on the size of the connected sets.

I highly recommend reading "Inside Microsoft SQL Server 2008: T-SQL Query" from Itzik Ben-Gan:

http://www.solidq.com/insidetsql/books/insidetsql2008/

(2005 edition also applies to this topic)

It touches your question, like so many others, when it comes to getting the most out of your inquiries.

+1
source share

All Articles