How can I delete expired data from a huge table without getting the log file out of control?

I have a huge table (3 billion rows), which, unfortunately, contains mostly expired data. I want to just delete all these expired lines and save the rest.

I can execute the statement as follows:

delete from giganticTable where exp_date < getDate() 

The execution plan somehow estimates that about 400 million rows will be deleted.

When executed, not only does this end in an hour, but the database transaction log file also grows from 6 to 90 GB. Note that the database was in a volume log recovery model while this was happening. In the end, I canceled this request, as I am sure there should be a better way to do this.

I have several tables for which I need to perform a similar operation. What is the fastest and most effective way to simply delete these lines if I have absolutely no desire to restore them?

Please note that I am using Microsoft SQL Server 2005.

+7
source share
3 answers

I found this useful when deleting from a table with a large number of rows to delete rows in packages, e.g. 5000 or so (usually I test to find out which value works the fastest, sometimes it's 5000 rows, sometimes 10,000, etc. d.). This allows you to quickly complete each delete operation, rather than wait a long time for one statement to throw 400 million records.

In SQL Server 2005, something like this should work (check first first):

 WHILE EXISTS ( SELECT * FROM giganticTable WHERE exp_date < getDate()) BEGIN DELETE TOP(5000) FROM giganticTable WHERE exp_date < getDate() END 

I would see that removing packages depends on the size of the log file. If it still detonates the logs, you can try changing the recovery model to Simple , delete the entries, and then return to Bulk Logged, but only if the system can suffer the loss of some recent data. I would definitely make a full backup before trying this procedure. This stream also indicates that you can configure the task to back up logs with the specified truncation, so this may be another option. I hope you have an instance with which you can test, but I would start with remote deletions to see how this affects the performance and size of the log file.

+9
source

You really do not want to interfere in trying something stupid, like disabling logging when you want to work hard on the table, since any problems during a long task can easily lead to database corruption and other problems. However, there is a problem with your problem.

Create a temporary table that matches the schema of your real table. Fill it with the data you want to use. Then crop the source table (very fast and easy in the log files). Finally, move the data from the temp table to your original (and now empty) table.

If you use automatic incremental primary keys, you need to force the field to take the original keys (so that you no longer have problems).

+3
source

You had to do this daily, so you won’t get such a huge job right away. Since you are in a situation, here are my suggestions:

  • Share the work, as rsbarro says. You probably don't need a while statement, you can do it in a few days.
  • Enter the date explicitly:

     delete from giganticTable where exp_date < '2013-08-07' 
  • I do not have a good idea about a huge magazine, it seems that this is not a very good way to do it.
+1
source

All Articles