Primary key deletion (cluster pointer) to increase insertion performance

We tested SQL timeouts and determined that the audit table is the bottleneck - all tables in our system contain insert, update, and delete triggers that trigger a new audit record.

This means that the audit table is the largest and busiest table in the system. However, data is received and not output (within this system), so select performance is not required.

Running select top 10 returns recently the insertion records, not the "first" records. order by works, of course, but I would expect select top to return strings based on their order on disk - I would expect it to return the smallest PK values.

It has been suggested that we discard the clustered index and, in fact, the primary key (a unique constraint). As I mentioned earlier, there is no need to select from this table on this system.

What is the performance impact of a clustered index on a table? What are the (unselected) consequences of having an unindexed, unclassified keyless table? Any other suggestions?

change

our audit includes CLR functions, and now I compare with and without PCs, indexes, FK, etc., to determine the relative cost of CLR functions and restrictions.

After the study, poor performance was not associated with insert , but instead was the CLR function that organized the audit. After removing the CLR and instead using the direct TSQL process, performance improved 20 times.

During testing, I also determined that columns with a clustered index and identifiers had little effect on insertion time, at least with respect to any other processing that takes place.

 // updating 10k rows in a table with trigger // using CLR function PK (identity, clustered)- ~78000ms No PK, no index - ~81000ms // using straight TSQL PK (identity, clustered) - 2174ms No PK, no index - 2102ms 
+4
source share
4 answers

According to Kimberly Tripp - Queen of Indexing - having a clustered index on a table actually helps INSERT performance:

Cluster Index Discussion Continued

  • Inserts are faster in a clustered table (but only in the "right" clustered table) than in comparison with a heap. The main problem here is a search in IAM / PFS to determine the insertion location in the heap is slower than in the cluster table (where the insertion location is known, determined by the clustered key). Inserts are faster when they are inserted into a table where the order is defined (CL) and where this order is constantly growing.

Source: blog post. Cluster index discussion continues ....

+6
source

An excellent test script and description of this script can be found on the Tibor Karaszi blog at SQLblog.com

My numbers do not fully correspond to it - I see more differences in the operator instruction than with the operators in the line.

With a row count of about a million, I get a fairly consistent one-line insert cycle for a clustered index, which runs a little faster than non-indexed (with clustering, taking up approximately 97% if not indexed).

Conversely, batch insertion (10,000 rows) is faster at indexed rather than clustered index (something from 75% -85% of cluster insertion time).

 clustered - loop - 1689 heap - loop - 1713 clustered - one statement - 85 heap - one statement - 62 

He describes what happens on each insert:

Heap: SQL Server should find where the row should go. To do this, it uses one or more IAM pages for the heap and cross-references these to one or more PFS pages for the database file (s). IMO, there being potential for noticeable overhead here. And even more, with many users clogging the same table, I can imagine blocking (waiting) the PFS and possibly also the IAM page.

Clustered table: Now it's dead simple. The SQL server moves the clustered index tree and find where the row should go. Since this is an increasing index key, each row will go to the end of the table (linked list).

+3
source

Keyless table? Even an auto-incremental surrogate key ?: (

As long as the key grows monotonously, index maintenance on insertion should be good - it just "adds at the end." โ€œGroupedโ€ simply means that the physical location of the table follows the index (because the data is part of the index). As long as the index is not fragmented (see Monotonically increasing bit), the cluster / data itself will not be logically fragmented, and this should not be a performance issue. (If there are updates, then clustering is a slightly different story: an updated record can "grow" and cause fragmentation.)

My suggestion, if this is the chosen route, then ... compares it with realistic data / load, and then decides whether such suggestions are justified. It would be nice to see that this change has been decided and why.

Happy coding.


In addition, any dependence on the order, except that from ORDER BY is erroneous in design. It may work now, but it is an implementation detail and can change in a subtle way (as simple as another request plan). Using the auto-increase key, ORDER BY DESC will always give the correct result (remember that auto-increment identifiers can be skipped, but if "reset" they will always increase depending on the insertion order).

+2
source

My primitive understanding is that even INSERT operations are usually faster with a clustered index than with a heap. In addition, disk space requirements are lower with clustered indexes.

Some interesting tests / scenarios that may shed light on your specific circumstances: http://technet.microsoft.com/en-us/library/cc917672.aspx .

+2
source

All Articles