Sql server primary key with partition problem

I am creating a table that will be partitioned and will contain a FILESTREAM column. The problem I am facing is that it seems to me that I should have a composite primary key ( FILE_ID and FILE_UPLOADED_DATE ), because FILE_UPLOADED_DATE is part of my partition scheme. It's right? I would prefer this to not be a composite key, and just just FILE_ID be the primary key ..... could it just be a user error?

Any suggestions would be appreciated.

Version: SQL Server 2008 R2

Schemes and functions of the section:

 CREATE PARTITION FUNCTION DocPartFunction (datetime) AS RANGE RIGHT FOR VALUES ('20101220') GO CREATE PARTITION SCHEME DocPartScheme AS PARTITION DocPartFunction TO (DATA_FG_20091231, DATA_FG_20101231); GO CREATE PARTITION SCHEME DocFSPartScheme AS PARTITION DocPartFunction TO (FS_FG_20091231,FS_FG_20101231); GO 

Create statement:

 CREATE TABLE [dbo].[FILE]( [FILE_ID] [int] IDENTITY(1,1) NOT NULL, [DOCUMENT] [varbinary](max) FILESTREAM NULL, [FILE_UPLOADED_DATE] [datetime] NOT NULL, [FILE_INT] [int] NOT NULL, [FILE_EXTENSION] [varchar](10) NULL, [DocGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE ON [PRIMARY], CONSTRAINT [PK_File] PRIMARY KEY CLUSTERED ( [FILE_ID] ASC ) ON DocPartScheme ([FILE_UPLOADED_DATE]) )ON DocPartScheme ([FILE_UPLOADED_DATE]) FILESTREAM_ON DocFSPartScheme; 

Error if I do not FILE_UPLOADED_DATE :

 Msg 1908, Level 16, State 1, Line 1 Column 'FILE_UPLOADED_DATE' is partitioning column of the index 'PK_File'. Partition columns for a unique index must be a subset of the index key. Msg 1750, Level 16, State 0, Line 1 Could not create constraint. See previous errors. 

Thanks!

+4
source share
3 answers

You are misleading the primary key and the clustered index. There is no reason that these two are one and the same. You can have a clustered index on FILE_UPLOADED_DATE and a separate, nonclustered primary key on FILE_ID . In fact, you are already doing something similar for the DocGUID column:

 CREATE TABLE [dbo].[FILE]( [FILE_ID] [int] IDENTITY(1,1) NOT NULL, [DOCUMENT] [varbinary](max) FILESTREAM NULL, [FILE_UPLOADED_DATE] [datetime] NOT NULL, [FILE_INT] [int] NOT NULL, [FILE_EXTENSION] [varchar](10) NULL, [DocGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL, constraint UniqueDocGUID UNIQUE NONCLUSTERED ([DocGUID]) ON [PRIMARY]) ON DocPartScheme ([FILE_UPLOADED_DATE]) FILESTREAM_ON DocFSPartScheme; CREATE CLUSTERED INDEX cdx_File ON [FILE] (FILE_UPLOADED_DATE) ON DocPartScheme ([FILE_UPLOADED_DATE]) FILESTREAM_ON DocFSPartScheme; ALTER TABLE [dbo].[FILE] ADD CONSTRAINT PK_File PRIMARY KEY NONCLUSTERED (FILE_ID) ON [PRIMARY]; 

However, such a design will lead to non-aligned indexes, which can cause very serious performance problems, as well as block all operations of fast partition switching. See Special Recommendations for Split Indexes :

Each sort table requires a minimum amount of memory to build. When you build a partitioned index that is aligned with the base table, sort tables are built one at a time, using less memory. However, when you create an unpublished partitioned index, the sort tables are built at the same time.

As a result, there should be enough memory to handle these related views. The larger the number of partitions, the more memory is required. The minimum size for each sorting table for each section is 40 pages, with 8 kilobytes per page. For example, a non-aligned partitioned index with 100 sections requires sufficient memory to serialize 4,000 (40 * 100) pages simultaneously. If this memory, the build operation will succeed, but performance may suffer. If this memory is not available, the build operation will fail.

Your design already has a misaligned index for DocGUID, so performance problems are likely to be present. If you want your indexes to be aligned, you must recognize one of the side effects of choosing a partition scheme: you can no longer have a logical primary key or enforce constraints unless the key contains a partition key.

And finally, you need to ask: why use a partitioned table? They are always slower than the non-segmented alternative. If you do not need fast partition switching operations for ETLs (which you already perform due to an unaligned index on DocGUID), there is basically no incentive to use a partitioned table. (Preemptive comment: a clustered index on FILE_UPLOADED_DATE is guaranteed to be a better alternative than "removing the partition").

+8
source

A partition column must always be present in the cluster index of a partitioned table. Any work you come up with should take this into account.

+4
source

I know this is an old question, but maybe Google leads someone else to this question:

A possible solution is not to split by date column, but by File_ID. Every day / week / month (or any other period of time that you use), you must start the agent at midnight, which takes Max(File_ID) where file_uploadet_date < GetDate() , adds the following file group to the partition scheme, and splits into MaxID + 1 .

Of course, you will still have a problem with a misaligned index on the DocID, except that you also add file_id to this unique index (may cause non-unique DocIds) and / or check its uniqueness in the insert / update trigger.

0
source

All Articles