The most efficient (fast) T-SQL DELETE for many rows?

Question

The most efficient (fast) T-SQL DELETE for many rows?

Our server application receives row information for adding to the database at a speed of 1000-2000 rows per second throughout the day. There are two mutually exclusive columns in the table that uniquely identify a row: one is a numeric identifier called a "tag" and the other is a 50character row called a "longTag". A string can have either a tag or longTag; not both.

Each row that leaves the socket may or may not exist in the table. If it exists, this line should be updated with new information. If it does not exist, it must be added. We use SQL 2005, and in some cases even SQL 2000, so we cannot use the new MERGE keyword.

Now I am doing this to build a giant DELETE statement that looks like this:

DELETE from MyRecords WHERE tag = 1 OR tag = 2 OR longTag = 'LongTag1' OR tag = 555

... where each input line has its own operator OR tag = n 'or' OR longTag = 'x'.

Then I do bulk XML loading using ISQLXMLBulkLoad to load all new records at once.

The giant DELETE statement sometimes expires, taking 30 seconds or longer. I'm not sure why.

When entries exit the socket, they must be inserted or they must replace existing strings. The way I do this is the best way to do this?

EDIT . The ratio of newlines to replacement lines will be very much tilted to newlines. In the data that I saw from production, for each correction there will usually be 100-1000 new lines.

EDIT 2 . Both insertions and deletions should be treated as a single transaction. If either insertion or deletion failed, they should be thrown back, leaving the table in the same state it was in before insertion and deletion.

EDIT 3 . Regarding NULL tags. I need to first briefly describe the system a bit more. This is a database for the trading system. MyTable is a trading table containing two types of transactions: the so-called "daily deals" and the so-called "opening positions". Money deals are just trades - if you were an options trader and you traded, this trade would be a daily deal in this system. Opening positions are basically a summary of your portfolio to this day. Both opening positions and daily trading are stored in one table. Day trades have tags (long or digital tags), but open positions do not. There may be repeated lines for opening positions - this is normal and normal. But there can be no duplicate lines for day trading. If day trading comes with the same tag as some record in the database, then the data in the table is replaced with the new data.

Thus, there are 4 possibilities for the values in the tag and longTag:

1) the tag is non-zero, and longTag is empty: it is a day trading with a numerical identifier. 2) the tag is zero, and longTag has a non-empty character value. This is a day trading with an alphanumeric identifier. 3) the tag is zero and longTag is empty: this is the opening position. 4) the tag is nonzero, and longTag has a non-empty character value. This is prevented because of our software on the server, but if this happens, the longTag will be ignored and it will be handled in the same way as case # 1. Again, this does not happen.

+4

performance sql-server tsql

John dibling Apr 3 '09 at 16:03

source share

8 answers

I think splitting the giant DELETE statement into 2 DELETE might help.

1 DELETE to deal with a tag and a separate DELETE for working with longTag. This will help the SQL server use indexes efficiently.

Of course, you can still run 2 DELETE statements in one backward DB direction.

Hope this helps

+5

Canton Apr 3 '09 at 16:11

source share

May be:

 DELETE FROM MyRecords WHERE tag IN (1, 2, 555) -- build a list OR longTag IN ('LongTag1')

I suspect that indexes will help you remove, but will slow down your inserts drastically, so I won’t play too much with that. But then my intuition is still not quite perfect, you can configure FillFactor or other elements to get around this problem, and the only thing I really know is that you really want a profile as anyway.

Another option is to load new inserts into the temp table (named something like InputQueue ), and then join the temp table in MyRecords to handle filtering updates. It will also simplify the upgrade process in two steps: you can remove tags and longTags as separate operations, and this can be much more efficient.

+3

Joel Coehoorn Apr 3 '09 at 16:05

source share

Something like this could simplify the process (you just insert the rows, regardless of whether they already exist - there is no need for a DELETE statement):

 CREATE TRIGGER dbo.TR_MyTable_Merge ON dbo.MyTable INSTEAD OF INSERT AS BEGIN SET NOCOUNT ON; BEGIN TRANSACTION DELETE MyTable FROM MyTable t INNER JOIN inserted i ON t.tag = i.tag DELETE MyTable FROM MyTable t INNER JOIN inserted i ON t.longTag = i.longTag INSERT MyTable SELECT * FROM inserted COMMIT TRANSACTION SET NOCOUNT OFF; END

EDIT: The previously combined DELETE statement was split into two separate statements to ensure optimal use of the index.

Not using DELETE at all, but rather, UPDATEing affected / duplicate rows will be easier in indexes.

+3

Tomalak Apr 3 '09 at 16:15

source share

Watch this video that demonstrates how to remove "nibbling". This process works well and can definitely reduce the lock / collision issues you see:

http://www.sqlservervideos.com/video/nibbling-deletes

+3

Michael K. Campbell Apr 3 '09 at 16:19

source share

It seems your table is not indexed by (tag) and (longTag)

Create two indexes: one on (tag) , one on (longTag)

If you plan to delete a really large number of records, then declare two table variables, fill them with values and delete like this:

 DECLARE @tag TABLE (id INT); DECLARE @longTag TABLE (id VARCHAR(50)); INSERT INTO @tag VALUES (`tag1`) INSERT INTO @tag VALUES (`tag2`) /* ... */ INSERT INTO @longTag VALUES ('LongTag1') /* ... */ DELETE FROM MyRecords r WHERE r.tag IN (SELECT * FROM @tag) OR r.longTag IN (SELECT * FROM @longTag)

You can also try a two-pass DELETE :

 DELETE FROM MyRecords r WHERE r.tag IN (SELECT * FROM @tag) DELETE FROM MyRecords r WHERE r.longTag IN (SELECT * FROM @longTag)

and see which instructions work longer to see if there is a problem with the indexes.

+2

Quassnoi Apr 3 '09 at 16:06

source share

Using ORs can cause a table scan - can you split it into four statements? Wrapping everyone in a transaction can also speed up the process.

 DELETE from MyRecords WHERE tag = 1 DELETE from MyRecords WHERE tag = 2 DELETE from MyRecords WHERE tag = 555 DELETE from MyRecords WHERE longTag = 'LongTag1'

+2

DJ Apr 3 '09 at 16:10

source share

Indexing:

Consider using an indexed saved computed column for longTag, which stores the checksum longTag. Instead of indexing "LongTag1", you are indexing the 4-byte int value (86939596).

Then your search queries [hopefully *] are faster, and you just need to include the longTag value in the request / delete. Your code will be somewhat more complex, but indexing is likely to be much more efficient.

* Testing required

+1

Rob garrison Apr 3 '09 at 20:32

source share

tpdi · Accepted Answer · 2009-04-03T16:26:22+0000

OR (or in) almost works as if each OR operand was a different request. That is, it turns into a table scan, and for each row, the database must check each OR operand as a predicate until it finds a match or finishes the operands.

The only reason for this is to make it one logical unit of work. You can also wrap a bunch of deletions in a transaction and commit only after successful completion.

Quassnoi makes an interesting suggestion - use a table - but since it uses INs and ORs, it comes out the same.

But try this one.

Create a new table reflecting the real table. Name it u_real_table. Index it by tag and longTag.

Put all incoming data in u_real_table.

Now that you are ready to do your main thing, instead join the mirror table o the real table on the tag. From the real table, delete all the tag'd lines in the u_real_table file:

 delete real_table from real_table a join u_real_table b on (a.tag = b.tag); insert into real_table select * from u_real_table where tag is not null;

See what we did here? Since we only join the tag, there is a better chance of using a tag index.

First, we removed everything new, then added new replacements. We can also do an update here. This is faster depending on the structure of the table and its indexes.

We did not need to write a script for this, we just needed to insert records into u_real_table.

Now we do the same for longTags:

 delete real_table from real_table a join u_real_table b on (a.longTag = b.longTag); insert into real_table select * from u_real_table where longTag is not null;

Finally, we clear the u_real_table:

 delete from u_real_table;

Obviously, we complete all of every delete / insert pair in a transaction, so that deletion becomes real only when the subsequent insert succeeds, and then we transfer all this to another transaction. Because it is a logical unit of work.

This method reduces your manual work, reduces the likelihood of manual error, and has some chances to expedite the removal.

Note that this means that missing tags and longTags are correctly null, not null or empty.

The most efficient (fast) T-SQL DELETE for many rows?

More articles: