Can this SQL query be optimized to speed things up?

I have a SQL Query (for SQL Server 2008 R2) that takes a very long time. I was wondering if there is a better way to do this?

SELECT @count = COUNT(Name) FROM Table1 t WHERE t.Name = @name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes) 

Table 1 contains about 90 million rows and is indexed by name and code. Excluded codes contain only about 30 lines.

This request is in a stored procedure and receives a call about 40 thousand times, the total time required to complete the procedure is 27 minutes. I think this is my biggest bottleneck due to the sheer number of lines he requests and how many times he does it.

So, if you know how to do this, that would be very helpful! If it cannot be optimized, I think I'm stuck with 27 minutes ...

EDIT

I changed NOT IN to NOT EXISTS , and it reduced the time to 10:59, so one of them is a huge advantage on my part. I'm still trying to make a group according to the statement, as suggested below, but this will require a complete rewrite of the stored procedure and may take some time ... (as I said, im not the best in SQL, but it starts to grow on me. ^ ^)

+4
source share
5 answers

In addition to workarounds, to make the request itself respond faster, did you consider that you maintain a column in the table that indicates whether it is in this set or not? This requires a lot of maintenance, but if the ExcludedCodes table does not change often, it might be better for this maintenance. For example, you can add a BIT column:

 ALTER TABLE dbo.Table1 ADD IsExcluded BIT; 

Make NOT NULL and the default value is 0. Then you can create a filtered index:

 CREATE INDEX n ON dbo.Table1(name) WHERE IsExcluded = 0; 

Now you just need to update the table once:

 UPDATE t SET IsExcluded = 1 FROM dbo.Table1 AS t INNER JOIN dbo.ExcludedCodes AS x ON t.Code = x.Code; 

And continuing, you will have to maintain this with triggers on both tables. In this case, your request will look like this:

 SELECT @Count = COUNT(Name) FROM dbo.Table1 WHERE IsExcluded = 0; 

EDIT

Regarding "NOT IN slower than LEFT JOIN", here is a simple test that I performed in just a few thousand lines:

enter image description here

EDIT 2

I'm not sure why this request will not do what you need and will be much more efficient than your 40K loop:

 SELECT src.Name, COUNT(src.*) FROM dbo.Table1 AS src INNER JOIN #temptable AS t ON src.Name = t.Name WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes) GROUP BY src.Name; 

Or the equivalent of LEFT JOIN:

 SELECT src.Name, COUNT(src.*) FROM dbo.Table1 AS src INNER JOIN #temptable AS t ON src.Name = t.Name LEFT OUTER JOIN dbo.ExcludedCodes AS x ON src.Code = x.Code WHERE x.Code IS NULL GROUP BY src.Name; 

I would put money on any of these requests in less than 27 minutes. I would even suggest that executing both queries in sequence would be much faster than your single query, which takes 27 minutes.

Finally, you can consider the indexed view. I do not know your table structure and if you are not violating any restrictions, but it is worth exploring IMHO.

+5
source

You say this is called about 40K times. What for? Is it in the cursor? If you need a cursor like that. Could you put the desired values ​​for @name in the temp table and index it and then join it?

 select t.name, count(t.name) from table t join #name n on t.name = n.name where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code) group by t.name 

This can give you all your results in a single query and is almost certainly faster than 40K individual queries. Of course, if you need to count all the names, it's even easier.

 select t.name, count(t.name) from table t NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t group by t.name 
+4
source

NOT EXISTS usually works better than NOT IN , but you should test it on your system.

 SELECT @count = COUNT(Name) FROM Table1 t WHERE t.Name = @name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code) 

Without knowing more about your request, it is difficult to provide specific optimization suggestions (that is, code suitable for copying / pasting). Do I need to work 40,000 times? It looks like your stored procedure needs to be reworked if possible. You can do the above once at the beginning of proc and paste the results into the temp table, which can contain the indexes from Table1 , and then join it instead of this query.

This particular bit may not even be a bottleneck making your request work for 27 minutes. For example, do you use a cursor over these 90 million lines or scalar-valued UDFs in your WHERE clauses?

+2
source

Have you ever thought about executing a query once and filling in data in a temp table or table? Sort of

 insert into #temp (name, Namecount) values Name, Count(name) from table1 where name not in(select code from excludedcodes) group by name 

And do not forget that you can use a filtered index if the table of excluded codes is somewhat static.

+1
source

Start evaluating your implementation plan. What is the hardest part to calculate? Regarding the relationship between the two tables, use JOIN for indexed columns: indexes optimize query execution.

0
source

All Articles