I have a table containing some buy / sell data that contains about 8 million records:
CREATE TABLE [dbo].[Transactions]( [id] [int] IDENTITY(1,1) NOT NULL, [itemId] [bigint] NOT NULL, [dt] [datetime] NOT NULL, [count] [int] NOT NULL, [price] [float] NOT NULL, [platform] [char](1) NOT NULL ) ON [PRIMARY]
Every X minutes, my program receives new transactions for each itemId, and I need to update it. My first solution is two steps DELETE + INSERT:
delete from Transactions where platform=@platform and itemid=@itemid insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price) [...] insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price)
The problem is that this DELETE statement takes an average of 5 seconds. This is too long.
The second solution found is to use MERGE. I created a stored procedure that accepts the Table-valued parameter:
CREATE PROCEDURE [dbo].[sp_updateTransactions] @Table dbo.tp_Transactions readonly, @itemId bigint, @platform char(1) AS BEGIN MERGE Transactions AS TARGET USING @Table AS SOURCE ON ( TARGET.[itemId] = SOURCE.[itemId] AND TARGET.[platform] = SOURCE.[platform] AND TARGET.[dt] = SOURCE.[dt] AND TARGET.[count] = SOURCE.[count] AND TARGET.[price] = SOURCE.[price] ) WHEN NOT MATCHED BY TARGET THEN INSERT VALUES (SOURCE.[itemId], SOURCE.[dt], SOURCE.[count], SOURCE.[price], SOURCE.[platform]) WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = @itemId AND TARGET.[platform] = @platform THEN DELETE; END
This procedure takes about 7 seconds with a table with 70 thousand records. So with 8M it will probably take a few minutes. The bottleneck is “When not agreed” - when I commented on this line, this procedure takes an average of 0.01 seconds.
So the question is: how to improve the execution of the delete statement?
Delete is necessary to make sure that this table does not contain transactions that were deleted in the application. But this is a real scenario, which happens very rarely, and the true need to delete records is less than 1 per 10,000 transaction updates.
My theoretical solution is to create an extra “transaction bit” column and use UPDATE instead of DELETE, then clear the table with a batch job every X minutes or hours and execute
delete from transactions where transactionDeleted=1
It should be faster, but I will need to update all SELECT statements in other parts of the application to use only transactionDeleted = 0 records and, therefore, may also have application performance.
Do you know any better solution?
UPDATE: Current indices:
CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions] ( [platform] ASC, [ItemId] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY] CONSTRAINT [IX2] UNIQUE NONCLUSTERED ( [ItemId] DESC, [count] ASC, [dt] DESC, [platform] ASC, [price] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]