The role of selectivity in crawling / index search

I have read in many books and SQL articles that selectivity is an important factor in creating an index. If a column has low selectivity, index search does more damage to it. But not one of the articles explains why. Can someone explain why this is so, or provide a link to the corresponding article?

+6
source share
2 answers

From Robert Sheldon's SimpleTalk Article: 14 SQL Server Indexing Questions You Are Too Shy to Ask

The ratio of unique values ​​in a key column is called the selectivity index. The more unique the values, the higher the selectivity, which means that the unique index has the highest possible selectivity. The query engine loves highly selective key columns, especially if those columns are referenced in the WHERE clause of your frequently run queries. The higher the selectivity, the faster the query engine can reduce the size of the result set. Of course, flipside that a column with a relatively small number of unique values ​​is rarely a good candidate for indexing.

Also check out these articles:

From the SqlServerCentral article:

In general, a non-clustered index should be selective. These values ​​in the column should be quite unique and the queries that filter on it should return small parts of the table.

The reason for this is that key / RID lookups are expensive operations and if a non-clustered index should be used to evaluate the required query, be covering or selective enough so that the search costs arent considered too high.

If SQL considers the index (or a subset of the index keys that the query will look for) insufficiently selective, then it is very likely that the index will be ignored and the query will execute as a clustered index (table).

It is important to note that this applies not only to the presenter column. There are scenarios in which a very non-selective column can be used as the leading column, and the rest of the columns in creating the index are selective enough to use.

+7
source

I am trying to write a very simple explanation (based on my current knowledge of Sql Server):

If the index has a low selectivity, this means that for the same value, a larger percentage of the total number of rows is obtained. (e.g. 200 out of 500 rows have the same value for your index)

Usually, if the index does not contain all the column information that you need, then it uses a pointer where to physically find the row that is associated with this "record" in the index. Then in the second step the engine should read this line.

So, how do you see such a search using two steps. And here comes the selectivity:

Other results that you get are related to the low selectivity of the more double work that the engine should do. Thus, there are some cases due to this fact, when even scanning a table is more efficient, then the index tends with very low selectivity.

+2
source

Source: https://habr.com/ru/post/923946/


All Articles