We have a MySQL table that looks something like this (minor columns removed):
CREATE TABLE `my_data` ( `auto_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `created_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated_ts` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `data_txt` varchar(256) CHARACTER SET utf8 NOT NULL, `issued_ts` timestamp NULL DEFAULT NULL, `account_id` int(11) NOT NULL, PRIMARY KEY (`auto_id`), KEY `account_issued_idx` (`account_id`,`issued_ts`), KEY `account_issued_created_idx` (`account_id`,`issued_ts`,`created_ts`), KEY `account_created_idx` (`account_id`,`created_ts`), KEY `issued_idx` (`issued_ts`) ) ENGINE=InnoDB;
The table has approximately 900 M rows, with one account containing more than 65% of these rows. I am asked to write queries on date ranges for create_ts and issu_ts, which depend on account_id, which seems to have a 1: 1 functional dependency on the auto-increment key.
A typical query would look like this:
SELECT * FROM my_data WHERE account_id = 1 AND created_ts > TIMESTAMP('2012-01-01') AND created_ts <= TIMESTAMP('2012-01-21') ORDER BY created_ts DESC LIMIT 100;
EXPLAIN in the request shows this:
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: my_data type: range possible_keys: account_issued_idx, account_issued_created_idx, account_created_idx, key: account_issued_created_idx key_len: 8 ref: NULL rows: 365314721 Extra: Using where
The problem is that the request takes too much time and is ultimately killed. I let it run a couple of times, and it pushes down the database node, since the OS (Linux) is running out of swap space.
I have repeatedly studied this problem and tried to split the query into uncorrelated subqueries, forcing indexes using an explicit SELECT clause and limiting the date range window, but the result is the same: poor performance (too slow) and too much host burden (always dying).
My question (s):
Is it possible that a query can be formulated to cut data in date ranges and execute acceptable for a real-time call? (<1s)
Are there any optimizations that I am missing or can help to get the performance that I am asked to receive?
Any other suggestions, tips or thoughts are welcome.
thanks