MySQL inefficient query in a large dataset

We have a MySQL table that looks something like this (minor columns removed):

CREATE TABLE `my_data` ( `auto_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `created_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `updated_ts` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', `data_txt` varchar(256) CHARACTER SET utf8 NOT NULL, `issued_ts` timestamp NULL DEFAULT NULL, `account_id` int(11) NOT NULL, PRIMARY KEY (`auto_id`), KEY `account_issued_idx` (`account_id`,`issued_ts`), KEY `account_issued_created_idx` (`account_id`,`issued_ts`,`created_ts`), KEY `account_created_idx` (`account_id`,`created_ts`), KEY `issued_idx` (`issued_ts`) ) ENGINE=InnoDB; 

The table has approximately 900 M rows, with one account containing more than 65% of these rows. I am asked to write queries on date ranges for create_ts and issu_ts, which depend on account_id, which seems to have a 1: 1 functional dependency on the auto-increment key.

A typical query would look like this:

 SELECT * FROM my_data WHERE account_id = 1 AND created_ts > TIMESTAMP('2012-01-01') AND created_ts <= TIMESTAMP('2012-01-21') ORDER BY created_ts DESC LIMIT 100; 

EXPLAIN in the request shows this:

 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: my_data type: range possible_keys: account_issued_idx, account_issued_created_idx, account_created_idx, key: account_issued_created_idx key_len: 8 ref: NULL rows: 365314721 Extra: Using where 

The problem is that the request takes too much time and is ultimately killed. I let it run a couple of times, and it pushes down the database node, since the OS (Linux) is running out of swap space.

I have repeatedly studied this problem and tried to split the query into uncorrelated subqueries, forcing indexes using an explicit SELECT clause and limiting the date range window, but the result is the same: poor performance (too slow) and too much host burden (always dying).

My question (s):

  • Is it possible that a query can be formulated to cut data in date ranges and execute acceptable for a real-time call? (<1s)

  • Are there any optimizations that I am missing or can help to get the performance that I am asked to receive?

Any other suggestions, tips or thoughts are welcome.

thanks

+4
source share
5 answers

It seems that mysql is using the wrong index for this query, try forcing another:

 SELECT * FROM my_data FORCE INDEX (`account_created_idx`) WHERE account_id = 1 AND created_ts > TIMESTAMP('2012-01-01') AND created_ts <= TIMESTAMP('2012-01-21') ORDER BY created_ts DESC LIMIT 100; 
+4
source

This question has been going on for years. However, there is a good answer.

The key to your struggle lies in your words that are removed by minor columns. When executing SELECT * .... ORDER BY X DESC LIMIT N there are no minor columns. This is because the whole set of results needs to be raised and shuffled. When you query all the columns in a complex table, this is a lot of data.

You have a good index for the WHERE . It would also be useful for the ORDER BY if it did not have DESC .

What you want is a deferred connection. Start by getting only the identifiers of the required rows.

  SELECT auto_id FROM my_data WHERE account_id = 1 AND created_ts > TIMESTAMP('2012-01-01') AND created_ts <= TIMESTAMP('2012-01-21') ORDER BY created_ts DESC LIMIT 100 

This will give you a list of auto_id values โ€‹โ€‹for the desired columns. To order this list, MySql only needs to move the id and timestamp values. This is LOTS less data to process.

Then you JOIN list the identifiers in your main table and get the results.

 SELECT a.* FROM my_data a JOIN ( SELECT auto_id FROM my_data WHERE account_id = 1 AND created_ts > TIMESTAMP('2012-01-01') AND created_ts <= TIMESTAMP('2012-01-21') ORDER BY created_ts DESC LIMIT 100 ) b ON a.auto_id = b.auto_id ORDER BY a.created_ts DESC 

Try it. This will probably save you a lot of time.

If you know a priori that both auto_id and created_ts are monotonous, you can do even better. Your subquery may contain

  ORDER BY auto_id DESC LIMIT 100 

This will reduce the data needed for further mixing.

Pro tip: avoid SELECT * in production systems; instead, list the columns that you really need. There are many reasons for this.

+1
source

Try MariaDB (or MySQL 5.6), as their Optimizer can do this faster. I have been using it for several months, and for some queries like yours, it's 1000% faster.

You need to click the "Hold Index" button: http://kb.askmonty.org/en/index-condition-pushdown/

0
source

Do not use the comparison function. Calculate the timestamps and use the calculated values, otherwise you cannot use the index to compare created_ts, and this is a field that will filter millions of rows from the result set

0
source

I do not know why MySQL uses (obviously) not the best index. Besides forcing the index, can you try the EXPLAIN plan for this option:

 SELECT * FROM my_data WHERE account_id = 1 AND created_ts > TIMESTAMP('2012-01-01') AND created_ts <= TIMESTAMP('2012-01-21') ORDER BY account_id , created_ts DESC LIMIT 100; 
0
source

Source: https://habr.com/ru/post/1414196/


All Articles