MySQL - large DELETE on multiple tables

I have 7 linked tables, and one of the tables has a timestamp column, and I want to delete all rows older than 30 days. However, these are VERY big deletions. I speak tens of millions of records. If I delete all these records from the main table, I have to look at the other 6 tables and delete the related records from these tables.

My question is the best way to optimize this?

I am thinking about using PARTITION , but only one table has a timestamp column. I'm worried that if I drop the old partition in the main table, the related records will still exist in the other six tables. Related records are linked by fields (sid, cid).

In context, I use snort and barnyard, which are IDS processors.

I am using MySQL 5.1.73, MyISAM tables

Here is a snippet from the cleanup logs:

 StartTime,EndTime,TimeElapsed,AffectedRows Wed Jan 6 01:00:01 EST 2016,Wed Jan 6 01:45:11 EST 2016,45:10,2911807 Thu Jan 7 01:00:02 EST 2016,Thu Jan 7 01:25:29 EST 2016,25:27,2230255 Fri Jan 8 01:00:01 EST 2016,Fri Jan 8 01:24:18 EST 2016,24:17,1400470 Sat Jan 9 01:00:02 EST 2016,Sat Jan 9 05:47:10 EST 2016,287:8,23360088 Sun Jan 10 01:00:01 EST 2016,Sun Jan 10 10:06:16 EST 2016,546:15,44970072 Mon Jan 11 01:00:01 EST 2016,Mon Jan 11 09:40:39 EST 2016,520:38,43948091 

This was my old cleanup script:

 /usr/bin/mysql --defaults-extra-file=/old/.my.cnf snort_db >> /root/snortcleaner.log 2>&1 <<EOF use snort_db; DROP TRIGGER IF EXISTS delete_old; DELIMITER // CREATE TRIGGER delete_old AFTER DELETE ON event FOR EACH ROW BEGIN DELETE FROM data WHERE data.cid = old.cid AND data.sid = old.sid; DELETE FROM iphdr WHERE iphdr.cid = old.cid AND iphdr.sid = old.sid; DELETE FROM icmphdr WHERE icmphdr.cid = old.cid AND icmphdr.sid = old.sid; DELETE FROM tcphdr WHERE tcphdr.cid = old.cid AND tcphdr.sid = old.sid; DELETE FROM udphdr WHERE udphdr.cid = old.cid AND udphdr.sid = old.sid; DELETE FROM opt WHERE opt.cid = old.cid AND opt.sid = old.sid; END // DELIMITER ; EOF # Send the main MySQL command: Deletes all records betweeen the oldest timestamp and 31 days from now() # Gets the oldest timestamp and ranges a deletion from that to 31 days before now(). If the oldest timestamp is more recent than 31 days, the following command returns 0 anyway. If it is older than 31 days, it will return them OLDEST_TIMESTAMP=$(mysql --defaults-extra-file=/old/.my.cnf -Dsnort_db -se "SELECT timestamp FROM event ORDER BY timestamp ASC LIMIT 1;") NUM_AFFECTED=$(mysql --defaults-extra-file=/old/.my.cnf -Dsnort_db -se "DELETE FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY); SELECT ROW_COUNT();") 

This is my current cleanup script:

 DELETE FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY); DELETE FROM data USING data LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL; DELETE FROM iphdr USING iphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL; DELETE FROM icmphdr USING icmphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL; DELETE FROM tcphdr USING tcphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL; DELETE FROM udphdr USING udphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL; DELETE FROM opt USING opt LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL; 

I switch between them because I don’t know which is faster, but the reality is that both of them are too slow.

+6
source share
5 answers

Try setting up foreign keys in a cascade during deletion, so you won’t need to create a trigger and manually attach and delete related records.

The following is an example of creating a relationship that cascades a delete

 CREATE TABLE parent ( id INT NOT NULL, PRIMARY KEY (id) ) ENGINE=INNODB; CREATE TABLE child ( id INT, parent_id INT, INDEX par_ind (parent_id), FOREIGN KEY (parent_id) REFERENCES parent(id) ON DELETE CASCADE ) ENGINE=INNODB; 

Mysql example

0
source

We solved this problem with creating and deleting partitions. So, you create partitions by date in your table (best practice is automation with MySql events), and when you need to delete old data - just drop some partitions - the operation will be instantaneous, without any delays or locks.

0
source

How to save row identifiers that you are about to delete to a temporary table before deleting them.

You can then switch the clear script to join the large table, where id = null to join the small (er) table, where id <> null.

0
source

I would do two things:

Define foreign keys in other tables with

 ON DELETE CASCADE 

and instead of sneaking into rows by hours, add LIMIT for easy deletion

 DELETE FROM event WHERE timestamp < DATE_SUB(NOW(), INTERVAL 31 DAY) LIMIT 500000 

And keep repeating it until there are lines affected, or as many times as experience tells you.

Set up 500000 so that you can do this without asking.

0
source

Change the script to:

  • make sure cid index exists for all tables
  • capture the cid values ​​you are about to remove from the event
  • rather than targeting all the old lines. target (to) (small) the maximum amount of old lines, so it runs relatively quickly
  • run the script often (say every 5 minutes, every hour, every day, which makes sense)

Sort of:

 CREATE TABLE IF NOT EXISTS deleted_cids(int cid); -- ensure same datatype as cid in tables TRUNCATE deleted_cids; INSERT INTO deleted_cids SELECT cid FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY) LIMIT 100000; -- Choose largest LIMIT that gives acceptable execution time DELETE event FROM deleted_cids, event WHERE event.cid = deleted_cids.cid; DELETE data FROM deleted_cids, data WHERE data.cid = deleted_cids.cid; DELETE iphdr FROM deleted_cids, iphdr WHERE iphdr.cid = deleted_cids.cid; DELETE icmphdr FROM deleted_cids, icmphdr WHERE icmphdr.cid = deleted_cids.cid; DELETE tcphdr FROM deleted_cids, tcphdr WHERE tcphdr.cid = deleted_cids.cid; DELETE udphdr FROM deleted_cids, udphdr WHERE udphdr.cid = deleted_cids.cid; DELETE opt FROM deleted_cids, opt WHERE opt.cid = deleted_cids.cid; 

The advantage is that each deletion — a one-time execution with indexing to delete all the target rows — should be fast.

By setting the LIMIT and execution frequency, you can find the right balance of server load. I would prefer the frequent execution of smaller quantities, so your server is never brought to interrupt the process.

0
source

All Articles