LIKE SQL crawl (performance issues)

I read and found that using LIKE causes a significant slowdown in requests.

We recommend using

Select Name From mytable a.Name IN (SELECT Name FROM mytable WHERE Name LIKE '%' + ISNULL(@Name, N'') + '%' GROUP BY Name) 

instead

 Select Name From mytable a.Name LIKE '%' + ISNULL(@Name, N'') + '%' 

Now I am not an expert on SQL, and I do not really understand the internal workings of these statements. Is this the best option worth considering when typing a few extra characters with each expression? Is there an even better (and easier to type) alternative?

+6
sql sql-server tsql sql-like
source share
3 answers

There are several performance issues to solve ...

Do not access the same table more than once if possible

Do not use a subquery for criteria that can be met without having to reference additional copies of the same table. This is acceptable if you need data from a copy of the table due to the use of aggregate functions (MAX, MIN, etc.), although analytic functions (ROW_NUMBER, RANK, etc.) may be more convenient (provided support).

Do not compare what you do not need

If your parameter is NULL, which means that you want to get some value for the columns you are comparing with, do not include filtering criteria. Such statements:

 WHERE a.Name LIKE '%' + ISNULL(@Name, N'') + '%' 

... to ensure that the optimizer will have to compare the values ​​for the name column, wildcards or not. Even worse, with LIKE , substituting the left side of the estimate ensures that the index cannot be used if it is present in the column under study.

More efficient approach:

 IF @Name IS NOT NULL BEGIN SELECT ... FROM ... WHERE a.name LIKE '%' + @Name + '%' END ELSE BEGIN SELECT ... FROM ... END 

SQL works well - that's all it takes to tailor. This is why you should consider dynamic SQL when you have queries with two or more independent criteria .

Use right tool

The LIKE operator is not very effective at finding text when you check for a string in text data. Full Text Search (FTS) technology was developed to address the shortcomings:

 IF @Name IS NOT NULL BEGIN SELECT ... FROM ... WHERE CONTAINS(a.name, @Name) END ELSE BEGIN SELECT ... FROM ... END 

Always check and compare

I agree with LittleBobbyTables - the solution ultimately relies on checking the query / execution plan for all alternatives, because the table design and data can affect the optimizer's decision and performance. SQL Server is the most efficient version with the least subtreecost, but may change over time if table statistics and indexes are not supported.

+9
source share

Just compare the execution plans and you will see the difference.

I do not have your exact data, but I fulfilled the following queries regarding the SQL Server 2005 database (yes, this is shy):

 SELECT UnitName FROM Units WHERE (UnitName LIKE '%Space Marine%') SELECT UnitName FROM Units WHERE UnitName IN ( (SELECT UnitName FROM Units WHERE UnitName LIKE '%Space Marine%' GROUP BY UnitName) ) 

Here are my execution plan results:

alt text

Your colleague’s proposal adds a subquery and a second clustered index scan to my query, as you can see above. Your mileage may vary, but be sure to check your execution plans to see how they compare. I can’t imagine how it would be more efficient.

+8
source share

If IIQR is a small table that somehow indexes the names (and is not the source table requested here from the very beginning), I don’t see how this longer version helps at all; it does the same, but simply adds the extra step of creating a result set that is used when using IN.

But I would doubt it, even if IIQR is a smaller "index" table. I would like to know more about the corresponding database and what ends for each query.

LIKE can adversely affect query performance, because this often requires scanning the table — physically loading the fields of each record and finding the corresponding text. Even if the field is indexed, this is probably true. But there can be no way around this if you need to search for partial text at any possible place inside the field.

Depending on the size of the table in question, this really doesn't matter.

For you, however; I would suggest that keeping it is simply better. If you really don’t know what the effect of complicating the query will be in performance, it can be difficult to try and decide which way to do something.

+4
source share

All Articles