Increase speed for MySQL JOIN for two large tables

I need to JOIN large tables in a MySQL query and it takes a very long time - about 180 seconds. Are there any tips for optimizing mergers?

My table has 10 fields. I use only 4 in the query - all rows. The table contains about 600,000 rows, and the result should have about 50 rows.

Four lines used: Title, Variables, Location, Date

Here is my request:

SELECT DISTINCT t1.Title, t1.Variables FROM `MyTABLE` t1 JOIN `MyTABLE` t2 USING (Title, Variables) WHERE (t1.Location, t1.Date) = ('Location1', 'Date1') AND (t2.Location, t2.Date) = ('Location2', 'Date2') 
+6
mysql
source share
8 answers

As others pointed out, you need the appropriate indexes. For this specific query, you can use indexes such as:

( Location, Date ) or ( Date, Location ) (for the WHERE ) as well as ( Title, Variables ) or ( Variables, Title ) (for the join , ON condition)

It would be useful to know exactly the size (i.e. data type) of the location, date, header and variable columns, since a large index is likely to be slower than a small one.

Finally, just a tip: I would not use fantastic comparison constructors like you.

 USING (Title, Variables) 

probably good, but I would certainly check if

 (t1.Location, t1.Date) = ('Location1', 'Date1') 

and

 (t2.Location, t2.Forecast_date) = ('Location2', 'Date2') 

behave as you expect. I would definitely skip EXPLAIN on it and compare the output with a “regular” old-fashioned comparison, for example:

  t1.Location = 'Location1' AND t1.Date = 'Date1' AND t2.Location = 'Location2' AND t2.Forecast_date = 'Date2' 

You can argue that logically, this is one and the same, and it does not matter - you would be right. But again, the MySQL optimizer is not very smart, and there is always the possibility of errors, especially with functions that are not used much. I think this is such a feature. Therefore, I would at least try EXPLAIN and see if these alternative notations are evaluated equally.

But what BenoCapo pointed out would not be easier to do something like this:

 SELECT Title, Variables FROM MyTABLE WHERE Location = 'Location1' AND Date = 'Date1' OR Location = 'Location2' AND Date = 'Date2' GROUP BY Title, Variables HAVING COUNT(*) >= 2 

EDIT: I changed HAVING COUNT(*) = 2 to HAVING COUNT(*) >= 2 . See comments (thanks again BenoKrapo)

EDIT: a few days after posting this answer, I found this post from Mark Callaghan, MySQL Architect for Facebook: http://www.facebook.com/note.php?note_id=243134480932 Essentially, it describes how similar but different smart comparisons provide incredible performance due to a MySQL optimizer error. So what I want to say is that you are trying not to use your syntax when you are suffering, perhaps you have fallen into error.

+8
source share

Yes. Create the appropriate indexes based on the queries that will run against the corresponding tables.

+2
source share

Can you add the SQL command to "EXPLAIN" and then re-run it, probably due to the lack of indexes in the columns you enter.

Also using STRAIGHT_JOIN and mention a table that is slower in size on the left and the other on the right to hint MySQL at choosing the first table.

+2
source share

Make sure the fields you map are indexed. Corresponding numerical values ​​are also faster than strings.

But it would be easier to just write

 SELECT DISTINCT Title, Variables FROM `MyTABLE` WHERE Location = 'Location1' AND Date = 'Date1' OR Location = 'Location2' AND Date = 'Date2' 
+1
source share

This might be a little cheating, but in fact it was easier for me to combine the two requests together in PHP after the request. This only works because I select two different variables.

 $query = "SELECT DISTINCT Title, Variables FROM MyTABLE WHERE Location='Location1' AND Variable='Variable1'"; $result = mysql_result($query); while ($row = mysql_array_assoc($result)) { $Title = $row['Title']; $Variables = $row['Variables']; $Array_result1[$Title] = $Variables; } $query = "SELECT DISTINCT Title, Variables FROM MyTABLE WHERE Location='Location2' AND Variable='Variable2'"; $result = mysql_result($query); while ($row = mysql_array_assoc($result)) { $Title = $row['Title']; $Variables = $row['Variables']; $Array_result2[$Title] = $Variables; } $Array_result = array_intersect($Array_result1, $Array_result2); 

I liked the idea of ​​using only one MySQL query to merge the two queries, but it is much faster.

+1
source share

Without a description of the tables and the query that we can do, we can help.

There are several things that can determine the speed of a connection.

  • Database Engine: Do you use InnoDB or MyISAM? Or maybe any other engine? Some of them are faster to search than others, which affects joins.
  • Indexes: Are the matching match columns indexed?
  • Partition indexes: perhaps you can split the table by index to make it even faster?

Also, check out the EXPLAIN query , which will cover all the steps that mysql takes to execute it. It can help you tremendously.

0
source share

Try using a composite index in the columns in the where clause and try to put all the other columns in the select in the columns of the Included columns, this will save the traditional search cost.

0
source share

I made two separate joins and combined the result using the join operator. I had good improvements over time. SELECT t1.Title, t1.Variables FROM MyTable t1 JOIN MyTable t2 on (t1.Location, t1.Date) = ('Location1', 'Date1') UNION SELECT t1.Title, t1.Variables FROM MyTable t1 JOIN MyTable t2 on (t2.Location, t2.Date) = ('Location2', 'Date2');

Make sure that both queries have the same number of columns and the same data type for each column. Also check the order of the select clause.

0
source share

All Articles