In theory, this will be random and fast. In practice, it will only be fast:
DELETE FROM tableX LIMIT 4000
This will be random, but terribly slow, with 600K lines:
DELETE FROM tableX ORDER BY RAND() LIMIT 4000
This will not be truly random (since there are usually spaces in identifiers), and it may not even delete exactly 4,000 lines (but slightly less when there are many spaces), but probably faster than the previous one.
Additional packaging is required in the subquery, because the syntax for deleting from multiple tables does not allow LIMIT
:
DELETE td FROM tableX AS td JOIN ( SELECT t.id FROM tableX AS t CROSS JOIN ( SELECT MAX(id) AS maxid FROM tableX ) AS m JOIN ( SELECT RAND() AS rndm FROM tableX AS tr LIMIT 5000 ) AS r ON t.id = CEIL( rndm * maxid ) LIMIT 4000 ) AS x ON x.id = td.id
Explain the output (from a subquery from the row table of 400 thousand):
id table possible_keys key_len rows select_type type key ref Extra 1 PRIMARY <derived2> system 1 1 PRIMARY <derived3> ALL 5000 1 PRIMARY t eq_ref PRIMARY PRIMARY 4 func 1 Using where;Using index 3 DERIVED tr index PRIMARY 4 398681 Using index 2 DERIVED Select tables optimized away
ypercubeα΅α΄Ή
source share