Sql Server Performance and Merge

I have a table containing some buy / sell data that contains about 8 million records:

CREATE TABLE [dbo].[Transactions]( [id] [int] IDENTITY(1,1) NOT NULL, [itemId] [bigint] NOT NULL, [dt] [datetime] NOT NULL, [count] [int] NOT NULL, [price] [float] NOT NULL, [platform] [char](1) NOT NULL ) ON [PRIMARY] 

Every X minutes, my program receives new transactions for each itemId, and I need to update it. My first solution is two steps DELETE + INSERT:

 delete from Transactions where platform=@platform and itemid=@itemid insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price) [...] insert into Transactions (platform,itemid,dt,count,price) values (@platform,@itemid,@dt,@count,@price) 

The problem is that this DELETE statement takes an average of 5 seconds. This is too long.

The second solution found is to use MERGE. I created a stored procedure that accepts the Table-valued parameter:

 CREATE PROCEDURE [dbo].[sp_updateTransactions] @Table dbo.tp_Transactions readonly, @itemId bigint, @platform char(1) AS BEGIN MERGE Transactions AS TARGET USING @Table AS SOURCE ON ( TARGET.[itemId] = SOURCE.[itemId] AND TARGET.[platform] = SOURCE.[platform] AND TARGET.[dt] = SOURCE.[dt] AND TARGET.[count] = SOURCE.[count] AND TARGET.[price] = SOURCE.[price] ) WHEN NOT MATCHED BY TARGET THEN INSERT VALUES (SOURCE.[itemId], SOURCE.[dt], SOURCE.[count], SOURCE.[price], SOURCE.[platform]) WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = @itemId AND TARGET.[platform] = @platform THEN DELETE; END 

This procedure takes about 7 seconds with a table with 70 thousand records. So with 8M it will probably take a few minutes. The bottleneck is “When not agreed” - when I commented on this line, this procedure takes an average of 0.01 seconds.

So the question is: how to improve the execution of the delete statement?

Delete is necessary to make sure that this table does not contain transactions that were deleted in the application. But this is a real scenario, which happens very rarely, and the true need to delete records is less than 1 per 10,000 transaction updates.

My theoretical solution is to create an extra “transaction bit” column and use UPDATE instead of DELETE, then clear the table with a batch job every X minutes or hours and execute

 delete from transactions where transactionDeleted=1 

It should be faster, but I will need to update all SELECT statements in other parts of the application to use only transactionDeleted = 0 records and, therefore, may also have application performance.

Do you know any better solution?

UPDATE: Current indices:

 CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions] ( [platform] ASC, [ItemId] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY] CONSTRAINT [IX2] UNIQUE NONCLUSTERED ( [ItemId] DESC, [count] ASC, [dt] DESC, [platform] ASC, [price] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] 
+7
source share
3 answers

OK, here is another approach. For a similar problem (a large scan WHEN NOT RESPECTED BY THE SOURCE, then DELETE) I reduced the MERGE execution time from 806 ms to 6 ms!

One problem with the above problem is that the WHERE NOT MATCHED BY SOURCE clause scans the entire TARGET table.

This is not so obvious, but Microsoft allows you to filter the TARGET table (using CTE) BEFORE performing the merge. Therefore, in my case, the TARGET lines were reduced from 250K to less than 10 lines. A big difference.

Assuming the above problem works with the fact that TARGET is filtered by @itemid and @platform, then the MERGE code will look like this. Changes made to the indexes above will also help this logic.

 WITH Transactions_CTE (itemId ,dt ,count ,price ,platform ) AS -- Define the CTE query that will reduce the size of the TARGET table. ( SELECT itemId ,dt ,count ,price ,platform FROM Transactions WHERE itemId = @itemId AND platform = @platform ) MERGE Transactions_CTE AS TARGET USING @Table AS SOURCE ON ( TARGET.[itemId] = SOURCE.[itemId] AND TARGET.[platform] = SOURCE.[platform] AND TARGET.[dt] = SOURCE.[dt] AND TARGET.[count] = SOURCE.[count] AND TARGET.[price] = SOURCE.[price] ) WHEN NOT MATCHED BY TARGET THEN INSERT VALUES ( SOURCE.[itemId] ,SOURCE.[dt] ,SOURCE.[count] ,SOURCE.[price] ,SOURCE.[platform] ) WHEN NOT MATCHED BY SOURCE THEN DELETE; 
+3
source

Using the BIT field for IsDeleted (or IsActive, like many others) is valid, but requires modification of the entire code, as well as the creation of a separate SQL task to periodically pass through and delete "deleted" records. It may be a way to go, but there is something less intrusive to try first.

I noticed in your set of 2 pointers that are not CLUSTERED. Is it possible to assume that the IDENTITY field? You might consider making the [IX2] UNIQUE CLUSTERED index one and changing the PK (again, I assume the IDENTITY field is CLUSTERED PK), which is NOT REQUIRED. I would also reorder the IX2 fields to first set up [Platform] and [ItemID]. Since your main operation searches for [Platform] and [ItemID] as a set, physical ordering of them in this way can help. And since this index is unique, it is a good candidate for CLUSTERED. This is definitely worth checking out, as it will affect all queries on the table.

Also, if changing the indexes, as I suggested, helps, it might be worth a try with both ideas and therefore make an IsDeleted field and see if that improves performance.

EDIT: I forgot to mention by making the IX2 index CLUSTERED and moving the [Platform] field to the top, you must get rid of the IX1 index.

EDIT2:

To be extremely clear, I suggest something like:

 CREATE UNIQUE CLUSTERED INDEX [IX2] ( [ItemId] DESC, [platform] ASC, [count] ASC, [dt] DESC, [price] ASC ) 

And, to be honest, changing the CLUSTERED index can also negatively affect queries in which JOINs are executed in the [id] field, so you need to test it thoroughly. In the end, you need to tune the system to the most frequent and / or expensive requests, and you may have to agree that some requests will be slower as a result, but it can cost this operation much faster.

+2
source

See stack overflow.squite

will the update be at the same price as the removal? Not. Updating will be a much easier operation, especially if you have a pointer to the PC (errrr, which is guid, not int). The fact is that updating a bit field is much cheaper. A (bulk) deletion will result in data permutation.

In light of this information, your idea of ​​using a bit field is very relevant.

0
source

All Articles