Delete all entries in 15 minutes

I have a table that receives about 10-15 thousand records per minute. Each of them is marked with the current timestamp at the entrance. The table is a MEMORY table, since data loss is not a concern.

Every minute I have a script that executes the following request:

 DELETE FROM tracker WHERE post_time < DATE_SUB(NOW(), INTERVAL 15 MINUTE) 

This query takes about 1-2 seconds to run, which is not bad, but it seems that this type of query (deleting anything older than X ) should be able to execute much faster when executed against a MEMORY table. It also has a corresponding boost for the processor, which sticks out like a sore thumb every minute.

Are there any optimizations I can do for my request to more efficiently execute this request?

+7
source share
3 answers

As always, you should review the request plan and post it here. You do this by writing EXPLAIN DELETE FROM tracker WHERE post_time < DATE_SUB(NOW(), INTERVAL 15 MINUTE)

Now the problem is that the DELETE query cannot use the index and must scroll through all your rows.

Even if you already have an index in post_time, it will most likely not be used, since by default indexes in MEMORY tables are hash indexes. Hash indices can only be used for equality checks, not ranges, such as post_time < DATE_SUB(NOW(), INTERVAL 15 MINUTE)

Create a BTREE index in the post_time column,

 CREATE INDEX post_time_idx ON tracker (post_time) USING BTREE; 
+6
source

Make an index in post_time. If the rows corresponding to post_time < DATE_SUB(NOW(), INTERVAL 15 MINUTE) represent a small part of the entire table, this should speed things up significantly.

+1
source

If your table never has data in it for more than 15 minutes, you can use a smaller data type than DATETIME to store timestamps. Depending on the granularity you care about, you can use a very small data type ... With SMALLINT you can store "minutes from midnight." If you want to lose even more detail, you can use TINYINT for 15 minutes of detail. Of course, this requires a bit more complex logic to handle cases “right after midnight” ...

 DELETE FROM tracker WHERE ( EXTRACT(DAY_MINUTE FROM NOW()) > 15 AND post_time < EXTRACT(DAY_MINUTE FROM NOW()) ) OR ( post_time < EXTRACT(DAY_MINUTE FROM NOW()) < 15 AND post_time < EXTRACT(DAY_MINUTE FROM NOW()+60) ) 

The advantage is that the data you need to read and compare is much smaller, so you can process it faster. This will be more important if you store your data on disk, where disk I / O is proportionally much more important than the bandwidth of your memory.

In addition, for a table with only rows of 10-15 thousand and the corresponding index, I doubt that this will create a noticeable difference - whether on disk or in memory.

+1
source

All Articles