Sql server obsolete database for clustered index or not

We have an outdated database, which is the db SQL server (2005 and 2008).

All primary keys in tables are UniqueIdentifiers.

The tables currently do not have a clustered index created on them, and we encounter performance problems in tables with only 750 thousand records. This is the first database that I have worked with unique identifiers as the only primary key, and I have never seen a sql server so slow in returning data.

I don't want to create a clustered index in a uniqueidentifier, because they are not sequential and therefore slow down applications when it comes to data insertion.

We cannot remove the unique identifier that is used for identity management of the remote site.

I thought about adding a large integer identifier column to the tables and creating a clustered index in that column and including a unique identifier column.

i.e.

int identity - the first column to maintain the insertion speed is a unique identifier - so that the application continues to work as expected.

The goal is to improve identity query and combine table performance.

Q1: Will this improve query performance in db or slow it down?

Q2: Is there an alternative to this that I have not listed?

Thanks Pete

Edit: Performance problems are a quick search of data using select statements, especially if several of the more “transactional / changing” tables are combined together.

Edit 2:. Connections between tables are usually made between the primary key and foreign keys; for tables with foreign keys, they are included in a non-clustered index to provide a better coverage index.

There are no other values ​​in the tables that would provide a good clustered index.

I am more inclined to add an additional identity column to each of the high-load tables, and then include the current Guid PK column in the clustered index to ensure the best query performance.

Edit 3: I would rate that 80% of requests are only for primary and foreign keys through the data access mechanism. As a rule, our data model has lazy loadable objects that execute a query when it is accessed; these queries use an object identifier and a PK column. We have a large number of user-managed data exclusion / inclusion requests that use foreign key columns as a filter based on criteria for type X to exclude the following identifiers. The remaining 20% ​​are those where the Enum clusters (int) or the columns of the date range, the system performs very few text queries.

If possible, I have already added coverage indexes to cover the most demanding queries, but so far I'm still disappointed in performance. Since bluefooted says the data is stored as a heap.

+6
sql sql-server indexing clustered-index identity-column
source share
4 answers

If you do not have a clustered index in the table, it is stored as a heap, not a b-tree. Access to Heap data is absolutely terrible in SQL Server, so you definitely need to add a clustered index.

I agree with your analysis that the GUID column is a poor choice for clustering, especially since you have no way to use NEWSEQUENTIALID (). You can create a new artificial integer key if you want, but if there is another column or combination of columns that would make sense as a clustered index, that’s good too.

Do you have a field that is often used to scan a range? What columns are used for joins? Is there a combination of columns that also uniquely identifies a row away from the GUID? Carrying out a sample data model will help us offer a good candidate for clustering.

+4
source share

I don’t know where your GUIDs came from, but if they are created during insertion, using NEWSEQUENTIALID () in SQL Server instead of NEWID () will help you avoid fragmentation problems during insertion.

Regarding the choice of a clustered index, as Kimberly L. Tripp describes here : “the most important factors when choosing a clustered index are that it is unique, narrow and static (ever-increasing has other advantages to minimize splits).” GUID does not meet the narrow requirement compared to INT or even BIGINT.

Kimberly also has an excellent article on GUIDs as BASIC KEYS and / or clustering key .

+2
source share

This is not 100% clear to me: does your number 1 access pattern query tables by GUID or other columns? And when connecting to other tables, columns (and data types) are most often used?

I can’t give you any solid recommendations until I understand more about how these GUIDs are used. I understand that you said that they are primary keys, but this does not guarantee that they are used as the main conditions for queries or in connections.

UPDATE

Now that I know a little more, I have a crazy offer. Copy these tables into the GUID, but set the fill factor to 60%. This will improve the page separation problem and give you more efficient queries for these puppies.

Regarding the use of Guid.NewGuid (), it seems that you can do sequentialGUIDs in C # in the end. I found the following SO code here:

[DllImport("rpcrt4.dll", SetLastError = true)] static extern int UuidCreateSequential(out Guid guid); public static Guid SequentialGuid() { const int RPC_S_OK = 0; Guid g; if (UuidCreateSequential(out g) != RPC_S_OK) return Guid.NewGuid(); else return g; } 

newsequentialID () is actually just a wrapper for UuidCreateSequential. I’m sure that if you can’t use it directly on the client, you can find a way to quickly make a round-the-world trip to the server to get a new sequential identifier, possibly even with a “distributor” table and a stored procedure for completing the task.

+1
source share

You do not indicate what your performance problems are. If the worst action is INSERT, then maybe your decision will be right. If this is something else, I would see how a clustered index can help.

You can see the existing indexes in the table and the queries that use them. You can choose an index that slightly degrades INSERT, provides a big benefit for the current areas of performance problems.

0
source share

All Articles