Table scan and indexing in SQL

What is the difference between table scanning and index in SQL and where is it used specifically?

+7
source share
4 answers

Scanning a table means iterating over all rows of the table.

Index scanning means iterating over all elements of the index, when the index of the element matches the search condition, the table row is retrieved through the index.

A regular index scan is less expensive than a table scan because the index is flatter than a table.

They have a lot of bibliography about this issue. Example:

Index access is an access method in which SQL Server uses an existing index to read and write data pages. Since access to the index significantly reduces the number of read / write operations, it often outperforms table scans.

In this method, a row is retrieved by moving the index using the indexed column values โ€‹โ€‹specified in the instruction. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform index scanning, Oracle searches for the index for the indexed values โ€‹โ€‹of the columns accessed by the operator. If an expression refers only to index columns, then Oracle reads the indexed values โ€‹โ€‹of the column directly from the index, not from the table.

+12
source

Most query engines have a query optimizer that attempts to create an effective query execution strategy. If indexes are available that can make the query faster, then the query optimizer will scan the index or search for the index, otherwise it will scan the table.

Example:

SELECT * FROM tbl WHERE category_id = 5; 

If the category_id index is missing, then the table will be scanned, that is, each individual record in the table will be checked for the correct category_name.

If, however, category_id is indexed, things become more complex. If the table is very large, index search is likely to be selected. However, if the table is small, the optimizer may decide that scanning the table is faster, because accessing the index requires some overhead. If category_id is not selective enough, for example, if there are only two categories, table scans can be faster even for large tables.

Indexes are usually organized as tree structures. Finding an item in a tree is an O (log n) operation. Scanning a table is an O (n) operation. The speed is determined mainly by the number of disk accesses required to complete the request. First, index search and subsequent table access for found records can generate more disks for small tables.

Let's look at another query:

 SELECT category_id FROM tbl WHERE category_id BETWEEN 10 AND 100; 

Here is another option. In this situation, index search may not be faster than table scan, but since we only retrieve catergory_id, index scan (not index search) may be even faster. An index scan reads each entry in the index table instead of using a tree structure (which makes an index search). However, since the requested information is completely contained in the index, access to the data table is not required. Scanning an index, like a table, performs O (n) operation, but since the index is usually smaller than the table, scanning the index requires less disk access than scanning the table.

The whole question is very complex and largely depends on the database engine. If you want to know more, read the documentation provided by the db provider.

+10
source

As @danihp answered the first part of the question, I will try to answer the second "where is it used specifically". This is for Oracle, but it is true for most DBMSs.

Suppose we have a table my_table that is uniquely indexed in the id column and has a second index that is not unique in the yet_another_column column:

 create my_table ( id varchar2(20) not null , another_column not null , yet_another_column , constraint pk_my_table primary key (id) ); create index i_my_table on my_table ( yet_another_column ); 

Now, if we were select * from my_table where id = '1' , this should / should have performed a unique scan of the pk_my_table index index. Then we re-enter the table using the index to return everything to my_table , where id = '1' .

If the request was instead of select id from my_table where id = 'a' , then there is no need for the second step, since all the values โ€‹โ€‹that we need are contained inside the index. In this case, the query will only perform a unique index scan.

Then, if our query was select * from my_table where yet_another_column = 'y' , then we have an index in the column, but it is not unique , so we will need to look at the entire index to try to find all the values โ€‹โ€‹that match our condition, t .e. index scan. Once again, we select columns that are not listed in our index, so we need to re-enter the table to get them.

Finally, if our request was select id from my_table where another_column = 'yes' . We do not have an index on another_column , so we need to scan the table to find the value, i.e. We should find everything in the table where another_column = 'yes' .

Now, there may not be many differences between crawling a table and indexing in these cases. We still need to find the value in the object in the database. However, since the index is much smaller and specifically designed for scanning (see Other Answers), it usually scans indexes much faster if only a small fraction of the rows in the table are required . If you want to say 10% of the table, then this point becomes "it depends."

+2
source

For SQL Server at least:

Index scanning can be faster, because the index does not seem to cover the entire set of columns in the table, while scanning the table (or clustered index) should read all the data. If the index contains all the columns in the table, then it should be roughly equivalent to scanning the table, and the choice between scanning the index and the table (or CIX) will be a coin toss. The difference is that when there are fewer columns in the index, you can place more rows of the index on an 8 KB page, which will reduce the number of pages you have to read in order to crawl all the data in the index.

To illustrate what I mean, imagine if you have two copies of the phone book, one with the last name, first name, address, and phone number, and the other with the last name, first name, and phone number. Now imagine that since the street address does not need to be printed, you can place two additional columns of names and phone numbers on any page of the phone book. The end result of this is that the phone book is thinner because you can place the same number of phone numbers on fewer pages. Then imagine that you are tasked with counting the number of phone numbers in the book. Which one did you choose, one that has a specified street address (which has more pages, similar to scanning a table) or one that does not have a street address (which has fewer pages, similar to most index scans)? I would choose one that has fewer pages.

Another wrinkle is that some indexes can be filtered out, which means that in most cases they don't have fewer columns (and therefore can fit more rows on one page), but they can also have a WHERE clause that removes a lot of rows . In this case, index scans will be better than table scans (but this will only work for queries that have the corresponding WHERE clause and the same semantics).

+2
source

All Articles