By breaking one query into four to avoid massive joins?

Question

By breaking one query into four to avoid massive joins?

So, I have a query that looks like this:

SELECT col1, col2, col3 ... FROM action_6_members m LEFT JOIN action_6_5pts f ON f.member_id = m.id LEFT JOIN action_6_10pts t ON t.member_id = m.id LEFT JOIN action_6_weekly w ON w.member_id = m.id WHERE `draw_id` = '1' ORDER BY m.id DESC LIMIT 0, 20;

now it makes a massive connection (3.5 million * 40 thousand * 20 thousand)

so my idea was this:

do SELECT * FROM action_6_members WHERE draw_id = '1' ORDER BY id DESC LIMIT 0, 20;

then let's move on to this using php to build $in = "IN(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)" ;

then run select * from action_6_5pts where member_id in $in
select * from action_6_10pts where member_id in $in
select * from action_6_weekly where member_id in $in

then combine them all together using php,

this means that although I use four different queries, I select only 20 rows from each, instead of making a connection at all.

Will I notice a significant performance bonus?

Refresh
So, the general consensus is that "DO NOT IT!"

here is a general overview of the application

he gets the code

the code is either 5pt, 10pt, or a weekly code,

all three types of code are in separate tables. three tables have a code, and member_id

references member_id to the identifier in the table action_6_members.

when the code is declared, the data is filled in the table action_6_members.

the identifier of this element is then populated in the table for the requested code.

The above query selects the first twenty members.

So my question.

What can I do to improve this?

since everything is being synchronized until the requests are completed.

action_6_members

 CREATE TABLE `action_6_members` ( `id` int(11) NOT NULL auto_increment, `draw_id` int(11) NOT NULL, `mobile` varchar(255) NOT NULL, `fly_buys` varchar(255) NOT NULL, `signup_date` datetime NOT NULL, `club` int(11) NOT NULL default '0' COMMENT '1 = yes, 2 = no', PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=1337 DEFAULT CHARSET=latin1

action_6_ 5 and 10pts

 CREATE TABLE `action_6_5pts` ( `code` varchar(255) NOT NULL, `member_id` int(11) NOT NULL, PRIMARY KEY (`code`), KEY `member_id` (`member_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1

action_6_weekly

 CREATE TABLE `action_6_weekly` ( `id` int(11) NOT NULL auto_increment, `code` varchar(255) NOT NULL, `member_id` int(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `id` (`id`), KEY `member_id` (`member_id`) ) ENGINE=MyISAM AUTO_INCREMENT=3250001 DEFAULT CHARSET=latin1

Update 2: request explanation

  id select_type table type possible_keys key key_len ref rows Extra  
 1 SIMPLE m ALL \ N \ N \ N \ N 1390 Using temporary;  Using filesort  
 1 SIMPLE f ALL member_id \ N \ N \ N 36000  
 1 SIMPLE t ALL member_id \ N \ N \ N 18000 Using where  
 1 SIMPLE w ref member_id member_id 4 m.id 525820 Using where

It just happened through: Recent download data from the database 7.26, 4.60, 2.45

1.0 is the normal maximum load ... All of the above means that it must “break” and cause additional processes to be processed. those. 7.26 means the load is 7 x the maximum blade server and he had to call others to help

so now this request is more than a monster, it is eaten by monsters as snacks ...

+4

optimization php mysql

Hailwood Aug 9 '10 at 4:38

source share

8 answers

A few trips, round trip, between the application and the database? No, this will not provide a performance bonus compared to a single request.

+1

OMG Ponies Aug 9 '10 at 4:42

source share

You do not need to use PHP to do this; you can do this in a single request with subqueries or a stored procedure with multiple requests.

To find out which one is faster, compare them.

+1

Borealid Aug 9 '10 at 4:43

source share

Oddly enough, I'm going to disagree with consensus, but at least in part.

First of all, you should never use LEFT JOIN. This is tempting, but it is almost always a bad idea. I assume in your case that the action_6_5pts, action_6_10pts and action_weekly tables may not contain all member identifiers. (I guess your data, therefore, if each table will contain all the member identifiers, then take the LEFT from your query, and everything will be fine.)

I suspect that perhaps the best way to post my data in the first place. As a rule, it’s nice to combine the same data types into one table. I do not want to guess about your data, so I will give you a pseudo-example. I have seen many times when people take similar data and split it into several tables (are smaller tables better?). Not always. For example, if you are building an invoice system, you might be tempted to think about dividing the monthly invoices into separate tables. So you create invoice_Jan2010, invoice_Feb2010 ... etc. But what if you want to search? The same client is probably not for all months, so it is difficult to get a list only with this client, without using LEFT JOIN. Ugh. We do not like LEFT JOIN! It is slow!

The best way to approach this is to have one table of accounts with a date (indexed!) And each client identifier. Any JOINs are guaranteed to find an invoice if the client does not exist (which does not matter)

Now in your case, perhaps you could make the 5pts and 10pts flags in the same table, and the weekly date? I make assumptions without knowing anymore that it is difficult to give you the “right” answer.

Now I said that I do not agree with consensus. If you do not change your data, as a rule, if you have a very large table, as you say, splitting into 4 queries using IN statements is a better idea than LEFT JOIN. If you want to speed it up, you can combine all 4 in 1 with UNION. It should be faster than LEFT JOIN.

You can also prove it easily. Take the query and place the EXPLAIN keyword in front of it and execute it directly on Mysql (using one of the tools: command line, Mysql GUI or even phpmyadmin). This will give you an idea of how he plans to join the tables together.

The explanation of the output is too large for this answer, but, as a rule, each line displays the number of lines to which the request will be attached. The less, the better. He will also tell you how this will happen. "Use temporary" or "Use file port" is something you want to avoid if possible (although if you sort it, it will be prepared). There is also a column for which key the rows will join. If this column is empty, you should try to create an index to make it work better.

Hope this helps! Good luck

+1

Cfreak Aug 9 '10 at 5:17

source share

do not do this. the database joins the tables very quickly and selects the appropriate rows - much faster, as if you were doing a lot of single queries.

0

oezi Aug 9 '10 at 4:43

source share

You will not know how effective the performance this approach will give you until you try it. In my experience, changing these kinds of queries to discrete ones is not something you can predict. What you're looking for is a watershed in MySQL, where creating internal tables at a certain size is a killer. As soon as you find out where this point is in your installation, you can play games with the help of splicing requests and subsequent processing.

0

staticsan Aug 9 '10 at 4:54

source share

You should use the in clause with Join and not use Limit. The limit is triggered after merging, and is not part of the request.

0

Benjamin anderson Aug 9 '10 at 4:56

source share

I can go crazy, but I do not see the index in the action_6_members table for the field that you are filtering on draw_id in the original query.

This means that the request will have to scan all the data in the action_6_members table, and then join the others.

Adding an index to the draw_id column draw_id probably help here.

You can create a combined key ( draw_id , id ), but it probably won’t bring you much if you do not pull data from the action_6_members table (if you didn’t, then you can use the index of several fields instead of reading through the data table)

Hope this helps ...

0

Dave rix Aug 9 '10 at 10:24

source share

Nicholas knight · Accepted Answer · 2010-08-09T04:46:32+0000

As a rule, if your SQL query can completely model what you want to do, then it will most likely be faster than breaking it into pieces glued together in PHP (or in any other language) within certain limits.

These boundaries are:

In this case, there should be no strange pathological behavior hiding in MySQL.
You should have reasonable indexes for all required columns.
There is no (or not likely) case that can only be sensibly detected / processed in PHP in which you would like to abort the request in the middle of the path.
Your result set is not pathologically huge (for example, it fits in memory and does not exceed the max_allowed_packet size in my.cnf ).

Now it’s not about whether your SQL (or the proposed alternative implementation in PHP) is optimal for what you are doing, but this can only be decided by taking into account additional information about what your application is doing and about re actually really trying to achieve. It may be good; it may not be so.

Looking briefly at your update using the table structure, nothing pops up on me as the likely cause of a big performance issue, but:

Do not use MyISAM unless you have created it. InnoDB is your friend, especially if the tables have a decent amount of bandwidth for writing. MyISAM locks with a full table can really bite you. Having FOREIGN KEYS for referential integrity would also be nice.
action_6_weekly has id as PRIMARY KEY and a UNIQUE KEY on ... id . This is redundant. PRIMARY KEY is actually a superset of UNIQUE KEY , you do not need to create a separate UNIQUE KEY .
The EXPLAIN result in your query will be interesting.

By breaking one query into four to avoid massive joins?

More articles: