Uniqueidentifier PK: Is a piece of SQL Server the right choice?

OK I read something here about SQL Server heaps, but nothing too categorical to really guide me. I will try to measure performance, but was hoping for some guidance on what I should learn. This is SQL Server 2008 Enterprise. Here are the tables:

Job

  • JobID (PK, GUID created externally)
  • StartDate (datetime2)
  • AccountId
  • Several additional accounting fields, mainly decimal and large strings

Jobsteps

  • JobStepID (PK, GUID created externally)
  • JobID FK
  • Startdate
  • Several additional accounting fields, mainly decimal and large strings

Usage: Many inserts (hundreds / sec), usually 1 JobStep for each job. Estimate perhaps 100-200 M rows per month. No updates at all, and the only deletion is archiving data older than 3 months.

Up to ~ 10 requests / sec against data. Some join JobSteps with Job, some just look at Jobs. Almost all requests will vary from StartDate, most of them include AccountId and some other accounting fields (we have indexes on them). Queries are pretty simple โ€” most of the execution plans are a join for JobSteps.

The priority is insert performance. Some lag (5 minutes or so) is permissible for the data to be displayed in requests, therefore replication to other servers and execution of requests from them are, of course, permissible.

GUID-based searches are very rare, except for combining JobSteps into Job.

Current setup : no clustered index. The only one that seems like a candidate is StartDate. But it does not improve. Jobs can be inserted anywhere in the 3-hour window after their StartDate. This may mean that a million rows are inserted in an order that is not final.

The data size for 1 Job + 1 JobStepId with my current indexes is about 500 bytes.

Questions :

  • Is this a good use of heap?

  • What is the effect of clustering in StartDate when it is almost not used for ~ 2 hours / 1 million rows? I assume that constant reordering will destroy the perf insert.

  • Should I just add bigint PK just to have smaller, ever increasing keys? (I still need hints for the search.)

I read the GUID as BASIC KEYS and / or clustering key , and it seems that even coming up with a key, it will save significant space on other indicators. Also, some resources suggest that heaps have some kind of perfectional problems in general, but I'm not sure if this is still applicable in SQL 2008.

And again, yes, I'm going to try performance and measure. I'm just trying to get some recommendations or links to other articles so that I can make a more informed decision about which paths to consider.

+4
source share
4 answers

Yes, heaps have problems. Your data will be logically fragmented throughout the show and cannot be simply defragmented.

Imagine you select your entire telephone directory into a bucket, and then try to find "bob smith". Or using a regular telephone directory with a clustered index by name, first.

The overhead of maintaining the index is trivial.

StartDate, unless it is unique, is not a good choice. A clustered index requires internal uniqueness for nonclustered indexes. If not declared unique, SQL Server will add a 4-byte "uniquifier".

Yes, I would use int or bigint to make this easier. Regarding the GUID: see Questions on the right side of the screen.

Edit:

Note. PK and clustered index are two separate issues, even if SQL Server clusters PK by default.

+5
source

Splitting a heap is not necessarily the end of the world. It looks like you will rarely scan data to prevent the end of the world.

Nonclustered indexes are what will affect your performance. Everyone will need to save the row address in an unreachable table (heap or clustered index). Ideally, your queries should never use the main table, because it stores all the information needed in an ideal way (including all columns, so this is the coverage index).

And yes, Kimberly Tripp is best suited for indexes.

Rob

+3
source

As your own research showed, and as all the other respondents said, using a GUID as a clustered index on a table is a bad idea.

However, having a heap is also not really a good choice, since heaps have other problems, mainly related to fragmentation and other things that just don't work with the heap.

My best advice will always be this:

  • use the primary cluster key in any data table (unless it is a temporary table or a table used for bulk loading)
  • try to verify that the clustered key is an INT IDENTITY or BIGINT IDENTITY

I would say that the benefits you get by adding INT / BIGINT - even for the sake of a good clustered index - far outweigh the disadvantages it has (as Kim Tripp also claims in his blog post that you pointed out).

Mark

+2
source

Since the GUId is your primary and foreign key, your database will still need to check the restrictions on each insert, you will probably need to index this. GUId indexing is not recommended due to its randomness. Therefore, I would say that you should go along the bigint route (probably identity) for your primary key and use it as a clustered index.

+1
source

All Articles