Which MySQL JOIN query is more efficient?

Given the following table structure:

CREATE TABLE user ( uid INT(11) auto_increment, name VARCHAR(200), PRIMARY KEY(uid) ); CREATE TABLE user_profile( uid INT(11), address VARCHAR(200), PRIMARY KEY(uid), INDEX(address) ); 

Which connection request is more efficient: # 1,

 SELECT u.name FROM user u INNER JOIN user_profile p ON u.uid = p.uid WHERE p.address = 'some constant' 

or 2:

 SELECT u.name FROM user u INNER JOIN (SELECT uid FROM user_profile WHERE p.address = 'some constant') p ON u.uid = p.uid 

What is the difference in performance?

+4
source share
4 answers

The first syntax is usually more efficient.

MySQL buffers derived queries, so using a derived query causes user_profile to be a slave table in a join.

Even if user_profile leads, the results of the subquery must first be buffered, which implies the effect of memory and performance.

A LIMIT applied to queries will make the first query much faster, which does not match the second.

Here is an example of plans. The t_source table has an index on (val, nid) :

First request:

 EXPLAIN SELECT * FROM t_source s1 JOIN t_source s2 ON s2.nid = s1.id WHERE s2.val = 1 1, 'SIMPLE', 's1', 'ALL', 'PRIMARY', '', '', '', 1000000, '' 1, 'SIMPLE', 's2', 'ref', 'ix_source_val,ix_source_val_nid,ix_source_vald_nid', 'ix_source_val_nid', '8', 'const,test.s1.id', 1, 'Using where' 

Second request:

 EXPLAIN SELECT * FROM t_source s1 JOIN ( SELECT nid FROM t_source s2 WHERE val = 1 ) q ON q.nid = s1.id 1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 100000, '' 1, 'PRIMARY', 's1', 'ref', 'PRIMARY', 'PRIMARY', '4', 'q.nid', 10000, 'Using where' 2, 'DERIVED', 's2', 'ref', 'ix_source_val,ix_source_val_nid,ix_source_vald_nid', 'ix_source_vald_nid', '4', '', 91324, 'Using index' 

As you can see, in the second case only part of the index is used, and q is the leading one.

Update:

Derived queries (regarding this issue) should not be confused with subqueries.

While MySQL cannot optimize derived queries (those used in the FROM ), subqueries (those used with IN or EXISTS ) are handled much better.

See these blog posts for more details:

+7
source

Considering explanatory queries for these samples, we get the following: (row headers are id, select_type, table, type, possible_keys, key, key_len, ref, rows, extra)

 1 SIMPLE u system PRIMARY NULL NULL NULL 1 1 SIMPLE p const PRIMARY,address PRIMARY 4 const 1 

And EXPLAIN for the second ...

 1 PRIMARY u system PRIMARY NULL NULL NULL 1 1 PRIMARY <derived2> system NULL NULL NULL NULL 1 2 DERIVED p ref address address 201 1 Using where 

So, the first request is simpler, and easier, usually more efficient.

However, from your CREATEs, it would be much more efficient to add an address field to the user table. Since the profile is 1-to-1 with a user table (on uid), you can combine the tables and maintain the normalization of the schema.

Then your request will be

 SELECT u.name FROM user u WHERE u.address = 'some constant' 

and the explanation shows

 1 SIMPLE u ref address address 201 const 1 Using where, using filesort 

Oddly enough, the simplified scheme uses filesorting, which is bad if you have many lines.

Read more about the explanation: http://dev.mysql.com/doc/refman/5.0/en/explain.html

+1
source

Not sure how the MySQL query handler will handle this, but my guess will be the first query that will work better and be more efficient.

The first query is also more standard, and easier to read from them is more preferable.

0
source

The answer usually depends on the statistics collected by the database. The first form seems to be easier for the optimizer.

As far as I remember, MySQL does not work with IN ... queries and subqueries

0
source

All Articles