At the same time, two radically different requests are being executed for 4 million records - one uses brute force

Question

At the same time, two radically different requests are being executed for 4 million records - one uses brute force

I am using SQL Server 2008. I have a table with over 3 million records that is linked to another table with a million records.

I spent several days experimenting with various ways to query these tables. For me it comes down to two completely different requests, both of which take 6 seconds to work on my laptop.

The first query uses brute force to estimate probable matches and removes incorrect matches using summing summing calculations.

The second gets all possible possible matches, and then removes the wrong matches using the EXCEPT query, which uses two allocated indexes to search for low and high mismatches.

Logically, one would expect brute force to be slow and indexes to be fast. Not this way. And I experimented a lot with indexes until I got the best speed.

In addition, brute force queries do not require as many indexes, which means that technically this will provide better overall system performance.

The following are two implementation plans. If you can’t see them, let me know and I will send it back to landscape / write to you.

Brute force request:

SELECT ProductID, [Rank] FROM ( SELECT p.ProductID, ptr.[Rank], SUM(CASE WHEN p.ParamLo < si.LowMin OR p.ParamHi > si.HiMax THEN 1 ELSE 0 END) AS Fail FROM dbo.SearchItemsGet(@SearchID, NULL) AS si JOIN dbo.ProductDefs AS pd ON pd.ParamTypeID = si.ParamTypeID JOIN dbo.Params AS p ON p.ProductDefID = pd.ProductDefID JOIN dbo.ProductTypesResultsGet(@SearchID) AS ptr ON ptr.ProductTypeID = pd.ProductTypeID WHERE si.Mode IN (1, 2) GROUP BY p.ProductID, ptr.[Rank] ) AS t WHERE t.Fail = 0

alt text

Index Based Exception Request:

 with si AS ( SELECT DISTINCT pd.ProductDefID, si.LowMin, si.HiMax FROM dbo.SearchItemsGet(@SearchID, NULL) AS si JOIN dbo.ProductDefs AS pd ON pd.ParamTypeID = si.ParamTypeID JOIN dbo.ProductTypesResultsGet(@SearchID) AS ptr ON ptr.ProductTypeID = pd.ProductTypeID WHERE si.Mode IN (1, 2) ) SELECT p.ProductID FROM dbo.Params AS p JOIN si ON si.ProductDefID = p.ProductDefID EXCEPT SELECT p.ProductID FROM dbo.Params AS p JOIN si ON si.ProductDefID = p.ProductDefID WHERE p.ParamLo < si.LowMin OR p.ParamHi > si.HiMax

alt text

My question is: based on execution plans that look more efficient? I understand that this may change as my data grows.

EDIT:

I updated the indexes and now have the following execution plan for the second query:

+4

performance sql-server sql-server-2008 sql-execution-plan

IamIC Jan 01 '10 at 18:50

source share

4 answers

Trust the optimizer.

Write the query that most simply expresses what you are trying to achieve. If you are having problems with this query, you should see if there are any missing indexes. But you still do not need to explicitly work with these indexes.

Do not worry about how you can implement such a search.

In very rare cases, you may need to force the query to use specific indexes (using hints) even more, but this is probably 0.1% of the queries.

In your published plans, your “optimized” version causes a scan against 2 indexes of your (I suppose) Params table (PK_Params_1, IX_Params_1). Without seeing the queries, it’s hard to understand why this is happening, but if you compare the comparison with one check of the table (“Brute force”) and two, it is easy to understand why the second is not more efficient.

I think I will try:

  SELECT p.ProductID, ptr.[Rank] FROM dbo.SearchItemsGet(@SearchID, NULL) AS si JOIN dbo.ProductDefs AS pd ON pd.ParamTypeID = si.ParamTypeID JOIN dbo.Params AS p ON p.ProductDefID = pd.ProductDefID JOIN dbo.ProductTypesResultsGet(@SearchID) AS ptr ON ptr.ProductTypeID = pd.ProductTypeID LEFT JOIN Params p_anti on p_anti.ProductDefId = pd.ProductDefID and (p_anti.ParamLo < si.LowMin or p_anti.ParamHi > si.HiMax) WHERE si.Mode IN (1, 2) AND p_anti.ProductID is null GROUP BY p.ProductID, ptr.[Rank]

those. Introduce an anti-compound that eliminates results that you do not want.

+3

Damien_The_Unbeliever Jan 01 '10 at 19:03

source share

In SQL Server Management Studio, put both queries in the same query window and immediately get the query plan. He should define query plans for both and give you a “percentage of the total lot” for each of them. A request with a lower percentage of the total lot will be more efficient.

+1

goric Jan 01 '10 at 18:58

source share

Is there 6 seconds on a laptop = .006 seconds on production equipment? Part of your queries that concern me is the clustered index scan specified in the query plan. In my experience, when a CI scan is turned on in the query plan, this means that the query will only be slower when adding data to the table.

How do the two functions work, since they cause the table to be scanned? Is it possible to save data in db and update LoMin and HiMax as you add rows.

Looking at two implementation plans, this is not very good. See how far left wide lines are. Wide lines mean that there are many lines. We need to reduce the number of rows earlier in this process so that we do not work with such large hash tables and large sorts and nested loops.

By the way, how many lines does your source have and how many lines are included in the result set?

+1

RC_Cleland Jan 01 '10 at 23:31

source share

IamIC · Accepted Answer · 2011-01-02T17:27:27+0000

Thank you all for your input and help.

By reporting what you wrote, experimented, and dug into the execution plan, I found that the answer is the polling point .

Too many records were returned to guarantee the use of the index.

See here (Kimberly Tripp).

At the same time, two radically different requests are being executed for 4 million records - one uses brute force

More articles: