I have an application for working with a file and fragmenting it into several segments, and then save the result in the sql server database. There are many duplicate files (possibly with different file paths), so first I look through all these files and calculate the Md5 hash memory for each file and mark the duplicate file using the [Duplicated] column.
Then every day I launched this application and saved the results in the [Result] table. The db schema is as follows:
CREATE TABLE [dbo].[FilePath] ( [FilePath] NVARCHAR(256) NOT NULL PRIMARY KEY, [FileMd5Hash] binay(16) NOT NULL, [Duplicated] BIT NOT NULL DEFAULT 0, [LastRunBuild] NVARCHAR(30) NOT NULL DEFAULT 0 ) CREATE TABLE [dbo].[Result] ( [Build] NVARCHAR(30) NOT NULL, [FileMd5Hash] binay(16) NOT NULL , [SegmentId] INT NOT NULL, [SegmentContent] text NOT NULL PRIMARY KEY ([FileMd5Hash], [Build], [SegmentId]) )
And I have a requirement to join this 2 table on FileMd5Hash.
Since the number of rows in [Result] is very large, I would like to add an int identifier column to associate them with tables, as shown below:
CREATE TABLE [dbo].[FilePath] ( [FilePath] NVARCHAR(256) NOT NULL PRIMARY KEY, [FileMd5Hash] binay(16) NOT NULL, **[Id] INT NOT NULL IDENTITY,** [Duplicated] BIT NOT NULL DEFAULT 0, [LastRunBuild] NVARCHAR(30) NOT NULL DEFAULT 0 ) CREATE TABLE [dbo].[Result] ( [Build] NVARCHAR(30) NOT NULL, **[Id] INT NOT NULL,** [SegmentId] INT NOT NULL, [SegmentContent] text NOT NULL PRIMARY KEY ([FileMd5Hash], [Build], [SegmentId]) )
So what are the pros and cons of these two ways?
sql database sql-server hash
ricky
source share