What is the difference between table scans and cluster index scans?

Since both a Table Scan and the Clustered Index Scan essentially scan all records in a table, why is scanning using a clustered index supposedly better?

As an example, what is the difference in performance between the following when there are many records ?:

 declare @temp table( SomeColumn varchar(50) ) insert into @temp select 'SomeVal' select * from @temp ----------------------------- declare @temp table( RowID int not null identity(1,1) primary key, SomeColumn varchar(50) ) insert into @temp select 'SomeVal' select * from @temp 
+60
sql sql-server indexing
Aug 20 '08 at 20:38
source share
3 answers

In a table without a clustered index (heap table), the data pages are not connected to each other, so moving the pages requires a search in the index distribution map .

However, in a clustered table there are data pages linked in a doubly linked list - sequential scan acceleration. Of course, in return, you have the overhead of maintaining the data pages in order for INSERT , UPDATE and DELETE . However, the heap table requires a second entry in the IAM.

If your query has a RANGE operator (for example: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100 ), then the cluster table (in guaranteed order) will be more efficient - since it can use index pages to find the corresponding data page (s). The heap will have to scan all rows, because it cannot rely on ordering.

And of course, a clustered index allows you to execute CLUSTERED INDEX SEEK, which is pretty much optimal for performance ... a bunch without indexes will always result in a table scan.

So:

  • In your query example, where you select all rows, the only difference is the double-linked list, supported by a clustered index. This should make your clustered table a little faster than a bunch with lots of rows.

  • For a query with a WHERE that may (at least partially) be satisfied with a clustered index, you will come forward because of the order β€” so you don’t have to scan the entire table.

  • For a query that is not satisfied with the clustered index, you pretty much even ... again, with the only difference being that it is a doubly linked list for sequential scanning. In any case, you are suboptimal.

  • For INSERT , UPDATE and DELETE heap may or may not win. The heap does not have to maintain order, but requires a second entry for the IAM. I think the relative difference in performance will be negligible, but also very data dependent.

Microsoft has a white paper that compares a clustered index with an equivalent non-clustered index on a heap (not quite the same as I discussed above, but close). Their conclusion is to put a clustered index in all tables. I will do my best to summarize their results (again, note that they really compare the non-clustered index with the clustered index here, but I think this is relatively comparable):

  • INSERT performance: the clustered index wins about 3% due to the second record needed for the heap.
  • UPDATE performance: the grouped index wins by about 8% due to the second search needed for the heap.
  • DELETE performance: the clustered index wins by about 18% due to the need for a second search and second deletion needed from the IAM for the heap.
  • single SELECT performance: the clustered index wins by about 16% due to the second search needed for the heap.
  • range SELECT performance: a clustered index wins about 29% due to random ordering for heaps.
  • concurrent INSERT : the heap table wins 30% under load due to pagination for a clustered index.
+68
Aug 20 '08 at 21:32
source share

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx

The logical and physical scan operator of the clustered index scans the clustered index specified in the Argument column. When an additional predicate WHERE :() is present, only those rows that satisfy the predicate are returned. If the Argument column contains an ORDERED clause, the query processor requested that the row output be returned in the order in which the clustered index sorted them. If the ORDERED clause is missing, the storage engine scans the index optimally (without guaranteeing output sorting).

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx

The logical and physical table scan operator extracts all rows from the table specified in the Argument column. If the WHERE :() predicate appears in the Argument column, only those rows that match the predicate are returned.

+4
Aug 20 '08 at 20:54
source share

A table scan should check each row of the table. To scan a clustered index, only an index scan is required. It does not check every entry in the table. These are, in fact, indices.

-one
Aug 20 '08 at 20:44
source share



All Articles