Removing a large number of rows from SQL Server - in an efficient and non-blocking manner

I am writing a procedure to delete all rows from multiple tables in n days.

Simple simple query easy to write

DELETE FROM [myTable] WHERE [Created] < GETDATE()-30 

One problem is that there is no index in the date field - I could add one, but I worked on it by doing something like:

 SELECT @var = MAX([ID]) FROM myTable WHERE Created < GETDATE()-30; DELETE FROM myTable WHERE ID < @var 

Does this sound like an acceptable method?

The problem is that the table is huge, and this query will probably delete hundreds of thousands of rows in each run.

Running it on a (slightly slow) test server takes about an hour or so, and it kills the table from other processes trying to read / write to it.

I do not mind that it takes some time (albeit faster) - but I can not get it to lock the table for an hour while it is working, as there is a constant read / write (mostly writing).

Knowing my database is pretty basic since I am not a dba encoder.

Could someone give me a decent method to accomplish this task - in the most efficient way.

+7
source share
5 answers

What you are looking for is a partitioned sliding window: How to implement an automatic sliding window in a split table on SQL Server 2005 , Split the table by day, and you can effectively drop the whole day in a single partition switching operation at midnight. Partition switch is mostly instant.

If you want a solution with slightly lower overhead (splitting has serious consequences and ruffles throughout the application, especially when you need to align indices, which is a requirement for fast switching operations), then you need to develop a circuit in accordance with this operation. With a confidence of 99.99%, I can say that the leftmost clustered key of your myTable should be a Created field. This will allow for efficient batch deletes ( delete top (2500) from myTable where Created < ... ). There are many reasons why you want this to be collected (2500 or so at a time), and most importantly, you should avoid escalating the lock, and you should keep the size of any single transaction within reasonable limits.

+6
source

Your method will suffer from the same disease as normal removal - you do not have an index in [Created]. Therefore, your method is even more confusing.

I would advise you to create the specified index and try the usual deletion on the test server.

Another suggestion is to run this outside of normal business hours through the scheduler.

+3
source

To improve performance, you should look at creating an index in the Create box if that is what you want to do often.

Then you can use

 DELETE FROM myTable WHERE Created < GETDATE()-30 

I saw many-hour processes, reduced to a few seconds with the corresponding index and adding statistics.

Indexes are easy to create, and tools that offer an index and provide syntax may be available. Example: SQL tuning adviser in MS SQL 2005 management studios.

+3
source

I assume that you cannot index the created column (since this is the logical place to run otherwise). Based on this assumption, you will have problems with performance and blocking. However, since you are using SQL 2005, you can take advantage of some of the new features mentioned in this article: http://nayyeri.net/reduce-locks-for-delete-and-update-commands-in-sql-server-2005- with-top-clause

Basically, create a query that will select all the records that you want to influence. Enter the row IDs (which are indexed) into the temp table. Associate the temp table with the table you want to delete based on the identifier. Then use the batch delete specified here to delete the groups at a time.

Thus, you create a temporary table based on your date criteria (it will not be effective due to non-index, but you can set NOLOCK so that it does not block you). Then you delete the table in batches to reduce the block of the actual delete.

+2
source

Creating an index and performing a deletion outside of business hours is probably the best thing to do. However, if they are not parameters, you can create a view based on your query and delete this view, so you only need to refer to the table once, and not twice, to speed up I / O.

 create view v1 as (select * FROM myTable WHERE Created < GETDATE()-30;) delete from v1 
0
source

All Articles