MySQL Slow on join. Any way to speed up

I have 2 tables. 1 is music, and 2 is listening. listenTrack tracks the unique pieces of each song. I am trying to get results for popular songs of the month. I get my results, but they just take too long. Below are my tables and query

430,000 lines

CREATE TABLE `listentrack` ( `id` int(11) NOT NULL AUTO_INCREMENT, `sessionId` varchar(50) NOT NULL, `url` varchar(50) NOT NULL, `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `ip` varchar(150) NOT NULL, `user_id` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=731306 DEFAULT CHARSET=utf8 

12,500 lines

 CREATE TABLE `music` ( `music_id` int(11) NOT NULL AUTO_INCREMENT, `user_id` int(11) NOT NULL, `title` varchar(50) DEFAULT NULL, `artist` varchar(50) DEFAULT NULL, `description` varchar(255) DEFAULT NULL, `genre` int(4) DEFAULT NULL, `file` varchar(255) NOT NULL, `url` varchar(50) NOT NULL, `allow_download` int(2) NOT NULL DEFAULT '1', `plays` bigint(20) NOT NULL, `downloads` bigint(20) NOT NULL, `faved` bigint(20) NOT NULL, `dateadded` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`music_id`) ) ENGINE=MyISAM AUTO_INCREMENT=15146 DEFAULT CHARSET=utf8 SELECT COUNT(listenTrack.url) AS total, listenTrack.url FROM listenTrack LEFT JOIN music ON music.url = listenTrack.url WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0 GROUP BY listenTrack.url ORDER BY total DESC LIMIT 0,10 

this query is not very complicated, and the lines are not too large, I do not think.

Is there any way to speed this up? Or can you offer a better solution? This will be cron's work at the beginning of each month, but I would also like to make the results of the day.

Oh btw, I run it locally, in 4 minutes to run, but by prod it takes about 45 seconds

+6
date join php mysql
source share
9 answers

I am more of a SQL Server guy, but these concepts should apply.

I would add indexes:

  • In ListenTrack add index with url and date_created
  • In the Music app, add an index with a URL

These indexes should significantly speed up the query (my table names were initially confused - fixed in the last change).

+10
source share

For the most part, you should also index any column that is used in JOIN. In your case, you should index both listentrack.url and music.url

@jeff s - the index music.date_created will not help, because you run it with a function, so MySQL cannot use the index in this column. Often you can rewrite a query so that the column with indexed links is used statically, like:

 DATEDIFF(DATE(date_created),'2009-08-15') = 0 

becomes

 date_created >= '2009-08-15' and date_created < '2009-08-15' 

This will filter out entries that relate to the years 2009-08-15, and allow any indexes in this column to be candidates. Please note: MySQL cannot use this index, it depends on other factors.

It's best to do a double index on listentrack(url, date_created) and then another index on music.url

These 2 indexes will cover this particular query.

Note that if you run EXPLAIN in this query, you still get using filesort , because it needs to write records to a temporary table on disk in order to execute ORDER BY.

In general, you should always run your query in EXPLAIN to get an idea of ​​how MySQL will execute the query, and then from there. See the EXPLAIN Documentation:

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

+5
source share

Try creating an index that helps with the join:

 CREATE INDEX idx_url ON music (url); 
+4
source share

I think I could have missed the obvious before. Why do you even join the music table? It seems you are not using the data in this table at all, and you are making a left join that is not required, right? I think that this table in the query will make it much slower and will not add any value. Take all the music links if url is not required, in which case you need the right connection to make it not include the line without the corresponding value.


I would add new indexes, as others say. In particular, I would add: music url listentrack date_created, url

This will improve your connection per ton.

Then I would look at the query, you force the system to do work on each row of the table. It would be better to rephrase the date limit as a range.

Not sure about the syntax from the top of the head: where '2009-08-15 00:00:00' <= date_created <2009-08-16 00:00:00

This should allow him to quickly use the index to search for matching records. Combined, the two key indexes in the music should allow it to find records based on date and URL. You should experiment, they might be better off going the other direction url, date_created by index.

The explanation plan for this query should say β€œindex usage” in the right column for both. This means that he does not need to hit the data in the table to calculate your amounts.

I would also check the memory settings that you configured for MySQL. It looks like you are missing out on dedicated memory. Be very careful about the differences between server settings and stream-based settings. A server with a 10 MB cache is quite small, a stream with a 10 MB cache can quickly consume a lot of memory.

Jacob

+3
source share

Pre-grouping and then combining makes things much faster with MySQL / MyISAM. (I'm suspicious that this is not required for other databases)

This should be as fast as the non-attached version:

 SELECT total, a.url, title FROM ( SELECT COUNT(*) as total, url from listenTrack WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0 GROUP BY url ORDER BY total DESC LIMIT 0,10 ) as a LEFT JOIN music ON music.url = a.url ; 

PS - Matching between two tables with an identifier instead of a URL is reasonable advice.

+2
source share

Why are you repeating the url in both tables?

You have a listentrack holding music_id and join this. Gets an exemption from text search, as well as an additional index.

In addition, this is perhaps more correct. You track the listening time of a specific track, not the URL. What if the URL changes?

+1
source share

After adding indexes, you may need to study adding a new column for date_created as unix_timestamp, which will speed up the math.

I'm not sure why you have a diff function, but as it turns out, you're looking for all the rows that were updated on a specific date.

You might want to look at your request as it seems to have an error.

If you use unit tests, you can compare the results of your query and query using the unix timestamp.

0
source share

you can add an index to the url field for both tables.

saying that when I converted from mysql to sql server 2008, with the same queries and the same database structures, the queries ran 1-3 orders of magnitude faster.

I think some of them are related to rdbms (mysql optimizers are not so good ...), and some of them are probably related to how rdbms resources reserve the system. although comparisons were made on production systems where only db would be executed.

0
source share

This will probably work to speed up the request.

CREATE INDEX music_url_index ON music (url) USING BTREE; CREATE INDEX listenTrack_url_index ON listenTrack (url) USING BTREE;

You really need to know the total number of comparisons and line scans that occur. To get this answer, take a look at the code here on how to do this using the explanation http://www.siteconsortium.com/h/p1.php?id=mysql002 .

0
source share

All Articles