OK I read something here about SQL Server heaps, but nothing too categorical to really guide me. I will try to measure performance, but was hoping for some guidance on what I should learn. This is SQL Server 2008 Enterprise. Here are the tables:
Job
- JobID (PK, GUID created externally)
- StartDate (datetime2)
- AccountId
- Several additional accounting fields, mainly decimal and large strings
Jobsteps
- JobStepID (PK, GUID created externally)
- JobID FK
- Startdate
- Several additional accounting fields, mainly decimal and large strings
Usage: Many inserts (hundreds / sec), usually 1 JobStep for each job. Estimate perhaps 100-200 M rows per month. No updates at all, and the only deletion is archiving data older than 3 months.
Up to ~ 10 requests / sec against data. Some join JobSteps with Job, some just look at Jobs. Almost all requests will vary from StartDate, most of them include AccountId and some other accounting fields (we have indexes on them). Queries are pretty simple โ most of the execution plans are a join for JobSteps.
The priority is insert performance. Some lag (5 minutes or so) is permissible for the data to be displayed in requests, therefore replication to other servers and execution of requests from them are, of course, permissible.
GUID-based searches are very rare, except for combining JobSteps into Job.
Current setup : no clustered index. The only one that seems like a candidate is StartDate. But it does not improve. Jobs can be inserted anywhere in the 3-hour window after their StartDate. This may mean that a million rows are inserted in an order that is not final.
The data size for 1 Job + 1 JobStepId with my current indexes is about 500 bytes.
Questions :
Is this a good use of heap?
What is the effect of clustering in StartDate when it is almost not used for ~ 2 hours / 1 million rows? I assume that constant reordering will destroy the perf insert.
Should I just add bigint PK just to have smaller, ever increasing keys? (I still need hints for the search.)
I read the GUID as BASIC KEYS and / or clustering key , and it seems that even coming up with a key, it will save significant space on other indicators. Also, some resources suggest that heaps have some kind of perfectional problems in general, but I'm not sure if this is still applicable in SQL 2008.
And again, yes, I'm going to try performance and measure. I'm just trying to get some recommendations or links to other articles so that I can make a more informed decision about which paths to consider.