How to efficiently delete rows without using the Truncate table in a 500,000 row table

Say we have a Sales table with 30 columns and 500,000 rows. I would like to delete 400,000 in the table (those where "toDelete='1'" ).

But I have a few limitations:

  • the table is read / written “often”, and I don’t need to “delete” for a long time and lock the table too long.
  • I need to skip the transaction log (for example, using TRUNCATE ), but when executing "DELETE ... WHERE..." (I need to set a condition), but I did not find a way to do this ...

Any tips would be helpful for converting

 DELETE FROM Sales WHERE toDelete='1' 

for something more sectioned and possibly a transaction log.

+24
sql tsql sql-server-2008 sql-delete truncate
Jun 27 '12 at 15:48
source share
8 answers

Calling DELETE FROM TableName will do all the deletion in one major transaction. It is expensive.

Here is another option that will delete rows in batches:

 deleteMore: DELETE TOP(10000) Sales WHERE toDelete='1' IF @@ROWCOUNT != 0 goto deleteMore 
+33
Jun 27 '12 at 15:55
source share

What you want is batch processing.

 While (select Count(*) from sales where toDelete =1) >0 BEGIN Delete from sales where SalesID in (select top 1000 salesId from sales where toDelete = 1) END 

Of course, you can experiment what is best used for a batch, I used from 500 to 50,000 depending on the table. If you use cascading deletion, you probably need a smaller number, since you have these child records to delete.

+10
Jun 27 '12 at 15:56
source share

One of the ways I should have done this in the past is to have a stored procedure or script that deletes n records. Repeat to the end.

 DELETE TOP 1000 FROM Sales WHERE toDelete='1' 
+5
Jun 27 '12 at 15:52
source share

You should try giving the ROWLOCK hint ROWLOCK that it does not lock the entire table. However, if you delete many lines, the lock escalates.

Also, make sure that there is a non-clustered filtered index in the toDelete column (only for 1 value). If possible, make the bit a column, not varchar (or whatever it is now).

 DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1' 

Ultimately, you can try to iterate over the table and delete in chunks.

Update

Since while loops and deleted fragments are new pink here, I will also attach my version (in combination with my previous answer):

 SET ROWCOUNT 100 DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1' WHILE @@rowcount > 0 BEGIN SET ROWCOUNT 100 DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1' END 
+3
Jun 27. '12 at 15:53
source share

My own approach to this feature will be as follows. This way there is no duplicate code, and you can control the size of your block.

 DECLARE @DeleteChunk INT = 10000 DECLARE @rowcount INT = 1 WHILE @rowcount > 0 BEGIN DELETE TOP (@DeleteChunk) FROM Sales WITH(ROWLOCK) SELECT @rowcount = @@RowCount END 
+3
Nov 23 '16 at 2:37
source share

I will leave my answer here, since I was able to test various approaches for mass removal and updating (I had to update and then delete 125 million lines, the server has 16 GB of RAM, Xeon E5-2680 @ 2.7 GHz, SQL Server 2012).

TL; DR : always update / delete by primary key, and not by any other conditions. If you cannot use PK directly, create a temporary table and fill it with PK values ​​and update / delete your table using this table. Use indexes for this.

I started with a solution from above (@Kevin Aenmey), but this approach turned out to be inappropriate, since my database worked and processes a couple of hundred transactions per second, and there was some lock (there was an index for all the fields there from the condition, using WITH(ROWLOCK) nothing has changed).

So, I added the WAITFOR , which allowed the database to process other transactions.

 deleteMore: WAITFOR DELAY '00:00:01' DELETE TOP(1000) FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3 IF @@ROWCOUNT != 0 goto deleteMore 

This approach was able to process ~ 1.6 million lines / hour for updating and ~ 0.2 million lines / hour for removal.

The transition to temporary tables has changed a lot.

 deleteMore: SELECT TOP 10000 Id /* Id is the PK */ INTO #Temp FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3 DELETE MT FROM MyTable MT JOIN #Temp T ON T.Id = MT.Id /* you can use IN operator, it doesn't change anything DELETE FROM MyTable WHERE Id IN (SELECT Id FROM #Temp) */ IF @@ROWCOUNT > 0 BEGIN DROP TABLE #Temp WAITFOR DELAY '00:00:01' goto deleteMore END ELSE BEGIN DROP TABLE #Temp PRINT 'This is the end, my friend' END 

This solution processed ~ 25 million lines / hour to update (15 times faster) and ~ 2.2 million lines / hour to delete (11 times faster).

+2
Mar 06 '19 at 10:39
source share

I used below to delete about 50 million entries -

 BEGIN TRANSACTION DeleteOperation: DELETE TOP (BatchSize) FROM [database_name].[database_schema].[database_table] IF @@ROWCOUNT > 0 GOTO DeleteOperation COMMIT TRANSACTION 

Note that keeping BatchSize <5000 is cheaper on resources.

+1
Jan 09 '17 at 5:47 on
source share

I think the best way to delete a huge number of records is to delete it using the Primary Key . (What is a Primary Key see here )

So, you need to generate a tsql script that contains the entire list of deleted rows and then run this script.

For example, the code below should generate this file

 GO SET NOCOUNT ON SELECT 'DELETE FROM DATA_ACTION WHERE ID = ' + CAST(ID AS VARCHAR(50)) + ';' + CHAR(13) + CHAR(10) + 'GO' FROM DATA_ACTION WHERE YEAR(AtTime) = 2014 

The output file will contain entries like

 DELETE FROM DATA_ACTION WHERE ID = 123; GO DELETE FROM DATA_ACTION WHERE ID = 124; GO DELETE FROM DATA_ACTION WHERE ID = 125; GO 

And now you need to use the SQLCMD utility to execute this script.

 sqlcmd -S [Instance Name] -E -d [Database] -i [Script] 

This approach can be found here https://www.mssqltips.com/sqlservertip/3566/deleting-historical-data-from-a-large-highly-concurrent-sql-server-database-table/

0
Aug 30 '17 at 8:29
source share



All Articles