I have an Azure Sql database about 9 GB in size. It serves a web application that processes about 135 thousand requests per hour. Most of the data is temporary, it lives in the database from several minutes to five days and is deleted. About 10 GB is moved through the database per day.
I tried to run a delete query on a table to delete about 250,000 records from 350,000 records. About 10 percent of the records have one or two nvarchar (max) values large enough to store large objects in the repository.
Over the weekend, I tried to delete them all at once. It worked for four hours before I canceled the request, then it bounced another 8 hours - a bad move. I really did not expect it to be so bad.
Then I tried a different approach. This batch worked at night, when the web application processed about 100 thousand requests per hour. The tblJobs Id field is a unique identifier that is the primary key.
insert @tableIds select Id from dbo.tblJobs with(nolock) where (datediff(day, SchedDate, getDate()) > 60) or (datediff(day, ModifiedDate, getDate()) > 3 and ToBeRemoved = 1) set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records count @tableIds: ' + convert(nvarchar(12), (select count(1) from @tableIds)) insert dbo.admin_MaintenanceLog(LogEntry) values(@maintLogStr) set @maintLogId = newid() set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records beginning loop...' insert dbo.admin_MaintenanceLog(Id, LogEntry) values(@maintLogId, @maintLogStr) while exists(select * from @tableIds) begin delete @tableIdsTmp begin transaction insert @tableIdsTmp select top 1000 id from @tableIds delete p from @tableIdsTmp i join dbo.tblJobs p on i.id = p.Id delete x from @tableIdsTmp t join @tableIds x on t.id = x.id set @maintLogStr = 'uspMaintenance [tblJobs] Obsolete J records remaining count @tableIds: ' + convert(nvarchar(12), (select count(1) from @tableIds)) update dbo.admin_MaintenanceLog set LogEntry = @maintLogStr, RecordCreated = getdate() where Id = @maintLogId commit transaction if @dowaits = 1 WAITFOR DELAY '00:00:01.000' end
SchedDate, ModifiedDate, and ToBeRemoved are not indexed, so collecting identifiers in @tableIds takes about 3 minutes - not bad.
It then took 1 hour 55 minutes from the journal entries to remove 11,000 entries from tblJobs, after which time caused by the remote computer would expire.
Why is it taking so long? What can I do to speed it up?
tsql azure azure-sql-database
RJBreneman
source share