SQL - Filtering Large Tables Using Joins - Best Practices

I have a table with a lot of data, and I need to combine it with some other large tables.

Only a small part of my table is really relevant to me every time.

When is the best time to filter my data?

  • In the where clause of SQL.

  • Create a temporary table with specific data and only then join it.

  • Add the predicate to the sentence of the first inner join ON.

  • Some other ideas.

one.

Select * From RealyBigTable Inner Join AnotherBigTable OnInner Join YetAnotherBigTable OnWhere RealyBigTable.Type = ? 

2.

 Select * Into #temp From RealyBigTable Where RealyBigTable.Type = ? Select * From #temp Inner Join AnotherBigTable On … Inner Join YetAnotherBigTable On … 

3.

 Select * From RealyBigTable Inner Join AnotherBigTable On RealyBigTable.type = ? AndInner Join YetAnotherBigTable On 

Another question: What happens first? Join or Where ?

+6
performance sql sql-server
source share
3 answers

Since you use INNER JOINs, the discussion of WHERE or JOIN depends only on your taste and style. Personally, I like to maintain a relationship between two tables (for example, a foreign key constraint) in the ON clause and actual filters against data in the WHERE clause.

SQL Server will analyze the query in the same token tree and, therefore, will build identical query execution plans.

If you used [LEFT / RIGHT] OUTER JOINS instead, this makes the world a difference, as performance is not only different, but also very likely.


To answer other questions:

When is the best time to filter my data?

  • In the where clause of SQL.
  • Create a temporary table with specific data and only then attach it.
  • Add the predicate to the first internal ON clause.
  • Another idea.

In a WHERE or ON clause, both are treated as the same thing. For 3, the "first inner join" is irrelevant. In an INNER JOIN scenario with multiple tables, it really doesn't matter what comes first (in the query), as the query optimizer will shuffle the order as it sees fit.

Using the temp table is completely unnecessary and will not help, because you still need to extract the corresponding part - which is what the JOIN will do. Moreover, if you have a good index in the JOIN conditions / WHERE parameter, the index will be used to view only the relevant data without looking at the rest of the tables.

+11
source share

You must put your request in the management studio, mark "enable the actual execution plan" and run it. This way you get the exact answer to what SQL Server did with your query. From now on, you can move forward with optimization.

Generally:

  • Columns used for joins must be indexed.
  • Use the most sensitive filter first
+1
source share

In a reliable query-based query planner, what happens (your case)

  • connection conditions and where conditions are analyzed at the same level

  • connection type and statistics determine the path (what happens first) - so that the smallest intermediate results are retrieved (smallest I / O request)

0
source share

All Articles