Uniqueness of several large URL fields in MS SQL

I have a table with the following definition:

CREATE TABLE url_tracker ( id int not null identity(1, 1), active bit not null, install_date int not null, partner_url nvarchar(512) not null, local_url nvarchar(512) not null, public_url nvarchar(512) not null, primary key(id) ); 

And I have a requirement that these three URLs are always unique - any single URL can appear many times, but the combination of the three must be unique (for a given day).

Initially, I thought I could do this:

 CREATE UNIQUE INDEX uniques ON url_tracker (install_date, partner_url, local_url, public_url); 

However, this returns me a warning:

 Warning! The maximum key length is 900 bytes. The index 'uniques' has maximum length of 3076 bytes. For some combination of large values, the insert/update operation will fail. 

Digging around, I learned about the INCLUDE argument for CREATE INDEX , but according to this question, converting a command to using INCLUDE will not make it unique on the URL.

 CREATE UNIQUE INDEX uniques ON url_tracker (install_date) INCLUDE (partner_url, local_url, public_url); 

How can I ensure the uniqueness of several relatively large nvarchar fields?


Resolution

So, from the comments and answers and more research that I am completing, I can do this:

 CREATE TABLE url_tracker ( id int not null identity(1, 1), active bit not null, install_date int not null, partner_url nvarchar(512) not null, local_url nvarchar(512) not null, public_url nvarchar(512) not null, uniquehash AS HashBytes('SHA1',partner_url+local_url+public_url) PERSISTED, primary key(id) ); CREATE UNIQUE INDEX uniques ON url_tracker (install_date,uniquehash); 

Thoughts?

+4
source share
2 answers

I would make a computed column with a hash of URLs and then create a unique index / constraint. Consider making a hash a robust computed column. It cannot be counted after insertion.

+4
source

Following the ideas from the conversation in the comments. Assuming you can change the data type of the URL to VARCHAR(900) (or NVARCHAR(450) if you really think you need Unicode URLs) and be satisfied with the URL length limit, this solution may work . It also assumes SQL Server 2008 or better. Please always indicate which version you are working with; not specific enough, because decisions can be highly version dependent.

Setup:

 USE tempdb; GO CREATE TABLE dbo.urls ( id INT IDENTITY(1,1) PRIMARY KEY, url VARCHAR(900) NOT NULL UNIQUE ); CREATE TABLE dbo.url_tracker ( id INT IDENTITY(1,1) PRIMARY KEY, active BIT NOT NULL DEFAULT 1, install_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP, partner_url_id INT NOT NULL REFERENCES dbo.urls(id), local_url_id INT NOT NULL REFERENCES dbo.urls(id), public_url_id INT NOT NULL REFERENCES dbo.urls(id), CONSTRAINT unique_urls UNIQUE ( install_date,partner_url_id, local_url_id, public_url_id ) ); 

Paste in a few urls:

 INSERT dbo.urls(url) VALUES ('http://msn.com/'), ('http://aol.com/'), ('http://yahoo.com/'), ('http://google.com/'), ('http://gmail.com/'), ('http://stackoverflow.com/'); 

Now insert some data:

 -- succeeds: INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id) VALUES (1,2,3), (2,3,4), (3,4,5), (4,5,6); -- fails: INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id) VALUES(1,2,3); GO /* Msg 2627, Level 14, State 1, Line 3 Violation of UNIQUE KEY constraint 'unique_urls'. Cannot insert duplicate key in object 'dbo.url_tracker'. The duplicate key value is (2011-09-15, 1, 2, 3). The statement has been terminated. */ -- succeeds, since it for a different day: INSERT dbo.url_tracker(install_date, partner_url_id, local_url_id, public_url_id) VALUES('2011-09-01',1,2,3); 

Cleaning:

 DROP TABLE dbo.url_tracker, dbo.urls; 

Now, if 900 bytes is not enough, you can slightly modify the table of URLs:

 CREATE TABLE dbo.urls ( id INT IDENTITY(1,1) PRIMARY KEY, url VARCHAR(2048) NOT NULL, url_hash AS CONVERT(VARBINARY(32), HASHBYTES('SHA1', url)) PERSISTED, CONSTRAINT unique_url UNIQUE(url_hash) ); 

The rest does not need to be changed. And if you try to paste the same URL twice, you will get a similar violation, for example.

 INSERT dbo.urls(url) SELECT 'http://www.google.com/'; GO INSERT dbo.urls(url) SELECT 'http://www.google.com/'; GO /* Msg 2627, Level 14, State 1, Line 1 Violation of UNIQUE KEY constraint 'unique_url'. Cannot insert duplicate key in object 'dbo.urls'. The duplicate key value is (0xd111175e022c19f447895ad6b72ff259552d1b38). The statement has been terminated. */ 
+3
source

All Articles