How to efficiently delete rows without using the Truncate table in a 500,000 row table

Question

How to efficiently delete rows without using the Truncate table in a 500,000 row table

Say we have a Sales table with 30 columns and 500,000 rows. I would like to delete 400,000 in the table (those where "toDelete='1'" ).

But I have a few limitations:

the table is read / written “often”, and I don’t need to “delete” for a long time and lock the table too long.
I need to skip the transaction log (for example, using TRUNCATE ), but when executing "DELETE ... WHERE..." (I need to set a condition), but I did not find a way to do this ...

Any tips would be helpful for converting

 DELETE FROM Sales WHERE toDelete='1'

for something more sectioned and possibly a transaction log.

+24

sql tsql sql-server-2008 sql-delete truncate

Skippy Fastol Jun 27 '12 at 15:48

source share

8 answers

Kevin Aenmey · Answer 1 · 2012-06-27 15:55

Calling DELETE FROM TableName will do all the deletion in one major transaction. It is expensive.

Here is another option that will delete rows in batches:

 deleteMore: DELETE TOP(10000) Sales WHERE toDelete='1' IF @@ROWCOUNT != 0 goto deleteMore

HLGEM · Answer 2 · 2012-06-27 15:56

What you want is batch processing.

 While (select Count(*) from sales where toDelete =1) >0 BEGIN Delete from sales where SalesID in (select top 1000 salesId from sales where toDelete = 1) END

Of course, you can experiment what is best used for a batch, I used from 500 to 50,000 depending on the table. If you use cascading deletion, you probably need a smaller number, since you have these child records to delete.

Cylindric · Answer 3 · 2012-06-27 15:52

One of the ways I should have done this in the past is to have a stored procedure or script that deletes n records. Repeat to the end.

 DELETE TOP 1000 FROM Sales WHERE toDelete='1'

Marcel N. · Answer 4 · 2012-06-27 15:53

You should try giving the ROWLOCK hint ROWLOCK that it does not lock the entire table. However, if you delete many lines, the lock escalates.

Also, make sure that there is a non-clustered filtered index in the toDelete column (only for 1 value). If possible, make the bit a column, not varchar (or whatever it is now).

 DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'

Ultimately, you can try to iterate over the table and delete in chunks.

Update

Since while loops and deleted fragments are new pink here, I will also attach my version (in combination with my previous answer):

 SET ROWCOUNT 100 DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1' WHILE @@rowcount > 0 BEGIN SET ROWCOUNT 100 DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1' END

WaitForPete · Answer 5 · 2016-11-23 14:37

My own approach to this feature will be as follows. This way there is no duplicate code, and you can control the size of your block.

 DECLARE @DeleteChunk INT = 10000 DECLARE @rowcount INT = 1 WHILE @rowcount > 0 BEGIN DELETE TOP (@DeleteChunk) FROM Sales WITH(ROWLOCK) SELECT @rowcount = @@RowCount END

Marko Juvančič · Answer 6 · 2019-03-06 10:39

I will leave my answer here, since I was able to test various approaches for mass removal and updating (I had to update and then delete 125 million lines, the server has 16 GB of RAM, Xeon E5-2680 @ 2.7 GHz, SQL Server 2012).

TL; DR : always update / delete by primary key, and not by any other conditions. If you cannot use PK directly, create a temporary table and fill it with PK values and update / delete your table using this table. Use indexes for this.

I started with a solution from above (@Kevin Aenmey), but this approach turned out to be inappropriate, since my database worked and processes a couple of hundred transactions per second, and there was some lock (there was an index for all the fields there from the condition, using WITH(ROWLOCK) nothing has changed).

So, I added the WAITFOR , which allowed the database to process other transactions.

 deleteMore: WAITFOR DELAY '00:00:01' DELETE TOP(1000) FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3 IF @@ROWCOUNT != 0 goto deleteMore

This approach was able to process ~ 1.6 million lines / hour for updating and ~ 0.2 million lines / hour for removal.

The transition to temporary tables has changed a lot.

 deleteMore: SELECT TOP 10000 Id /* Id is the PK */ INTO #Temp FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3 DELETE MT FROM MyTable MT JOIN #Temp T ON T.Id = MT.Id /* you can use IN operator, it doesn't change anything DELETE FROM MyTable WHERE Id IN (SELECT Id FROM #Temp) */ IF @@ROWCOUNT > 0 BEGIN DROP TABLE #Temp WAITFOR DELAY '00:00:01' goto deleteMore END ELSE BEGIN DROP TABLE #Temp PRINT 'This is the end, my friend' END

This solution processed ~ 25 million lines / hour to update (15 times faster) and ~ 2.2 million lines / hour to delete (11 times faster).

Ankush · Answer 7 · 2017-01-09 05:47

I used below to delete about 50 million entries -

 BEGIN TRANSACTION DeleteOperation: DELETE TOP (BatchSize) FROM [database_name].[database_schema].[database_table] IF @@ROWCOUNT > 0 GOTO DeleteOperation COMMIT TRANSACTION

Note that keeping BatchSize <5000 is cheaper on resources.

Developer · Answer 8 · 2017-08-30 08:29

I think the best way to delete a huge number of records is to delete it using the Primary Key . (What is a Primary Key see here )

So, you need to generate a tsql script that contains the entire list of deleted rows and then run this script.

For example, the code below should generate this file

 GO SET NOCOUNT ON SELECT 'DELETE FROM DATA_ACTION WHERE ID = ' + CAST(ID AS VARCHAR(50)) + ';' + CHAR(13) + CHAR(10) + 'GO' FROM DATA_ACTION WHERE YEAR(AtTime) = 2014

The output file will contain entries like

 DELETE FROM DATA_ACTION WHERE ID = 123; GO DELETE FROM DATA_ACTION WHERE ID = 124; GO DELETE FROM DATA_ACTION WHERE ID = 125; GO

And now you need to use the SQLCMD utility to execute this script.

 sqlcmd -S [Instance Name] -E -d [Database] -i [Script]

This approach can be found here https://www.mssqltips.com/sqlservertip/3566/deleting-historical-data-from-a-large-highly-concurrent-sql-server-database-table/

How to efficiently delete rows without using the Truncate table in a 500,000 row table

More articles: