Slow MySQL Query - cache data in a PHP array?

I need to select some data from a MySQL database using PHP. This can be done as part of a single MySQL query, which takes 5 minutes to run on a good server (several JOINs on tables with more than 10 Mio-rows).

I was wondering if it would be better to split the query into PHP and use some loops rather than MySQL. In addition, it would be better to query all emails from a single table with 150,000 rows in the array, and then check the array instead of thousands of MySQL SELECTs.

Here is the request:

SELECT count(contacted_emails.id), contacted_emails.email FROM contacted_emails LEFT OUTER JOIN blacklist ON contacted_emails.email = blacklist.email LEFT OUTER JOIN submission_authors ON contacted_emails.email = submission_authors.email LEFT OUTER JOIN users ON contacted_emails.email = users.email GROUP BY contacted_emails.email HAVING count(contacted_emails.id) > 3 

EXPLAIN returns: EXPLAIN

Indexes in 4 tables:

 contacted_emails: id, blacklist_section_id, journal_id and mail blacklist: id, email and name submission_authors: id, hash_key and email users: id, email, firstname, lastname, editor_id, title_id, country_id, workplace_id 

jobtype_id

The contacted_emails table is created as:

 CREATE TABLE contacted_emails ( id int(10) unsigned NOT NULL AUTO_INCREMENT, email varchar(150) COLLATE utf8_unicode_ci NOT NULL, contacted_at datetime NOT NULL, created_at datetime NOT NULL, blacklist_section_id int(11) unsigned NOT NULL, journal_id int(10) DEFAULT NULL, PRIMARY KEY (id), KEY blacklist_section_id (blacklist_section_id), KEY journal_id (journal_id), KEY email (email) ) ENGINE=InnoDB AUTO_INCREMENT=4491706 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 
+7
arrays php mysql
source share
4 answers

Following the recommendations, I chose this solution:

 SELECT ce.email, ce.number_of_contacts FROM ( SELECT email, COUNT(id) AS number_of_contacts FROM contacted_emails GROUP BY email HAVING number_of_contacts > 3 ) AS ce NATURAL LEFT JOIN blacklist AS bl NATURAL LEFT JOIN submission_authors AS sa NATURAL LEFT JOIN users AS u WHERE bl.email IS NULL AND sa.email IS NULL AND u.email IS NULL 

It takes 10 seconds to start, at the moment this is normal. As soon as I have more data in the database, I will need to think about another solution where I will create a temporary table.

So in conclusion, loading the whole table as a php array is not very good for performance, as for mysql queries.

0
source share

Your indexes look great.

The performance problems seem to come from the fact that you are JOIN all rows, and then filtering using HAVING .

This will probably work better:

 SELECT * FROM ( SELECT email, COUNT(id) AS number_of_contacts FROM contacted_emails GROUP BY email HAVING COUNT(id) > 3 ) AS ce LEFT OUTER JOIN blacklist AS bl ON ce.email = bl.email LEFT OUTER JOIN submission_authors AS sa ON ce.email = sa.email LEFT OUTER JOIN users AS u ON ce.email = u.email /* EDIT: Exclude-join clause added based on comments below */ WHERE bl.email IS NULL AND sa.email IS NULL AND u.email IS NULL 

Here you limit your initial GROUP ed dataset to JOIN s, which is significantly more optimal.

Despite the context of your original query, the LEFT OUTER JOIN tables that appear to be used at all, so the ones below will probably return the same results with even less overhead:

 SELECT email, COUNT(id) AS number_of_contacts FROM contacted_emails GROUP BY email HAVING count(id) > 3 

What is the point of these JOIN ed tables? LEFT JOIN does not allow them to reduce data, and you look only at the aggregate data from contacted_emails . Did you mean to use INNER JOIN instead?


EDIT: You mentioned that the join point is to exclude messages in existing tables. I modified my first request to properly merge exceptions (this was a mistake in your originally published code).

Here is another possible option that may work well for you:

 SELECT FROM contacted_emails LEFT JOIN ( SELECT email FROM blacklist UNION ALL SELECT email FROM submission_authors UNION ALL SELECT email FROM users ) AS existing ON contacted_emails.email = existing.email WHERE existing.email IS NULL GROUP BY contacted_emails.email HAVING COUNT(id) > 3 

What I'm doing here is collecting existing letters in a subquery and making a single exception to this view.

Another way you can try to express this is as an uncorrelated subquery in the WHERE clause:

 SELECT FROM contacted_emails WHERE email NOT IN ( SELECT email FROM blacklist UNION ALL SELECT email FROM submission_authors UNION ALL SELECT email FROM users ) GROUP BY email HAVING COUNT(id) > 3 

Try them all and see what gives the best execution plan in MySQL.

+3
source share

A few thoughts, in terms of request, you can find it faster if you

 count(*) row_count 

and change the HAVING to

 row_count > 3 

since this can be done from the contacted_emails.email index without having to access the row to get contacted_emails.id . Since both fields are NOT NULL and contacted_emails is the base table, this must be the same logic.

Since this query will only increase as more data is collected, I would suggest a pivot table in which you store counters (possibly for one unit of time). This can be updated periodically using cronjob or on the fly using triggers and / or application logic.

If you use the unit of time option in the created_at file and / or save the latest update to cron, you can get real-time results by pulling and adding the latest data.

Any cache solution should be configured in any case to stay in real time, and a full request was executed every time the data was cleared / updated.

As indicated in the comments, the database is built to aggregate large amounts of data. PHP is not.

+2
source share

You will probably be better off using the summary table, which is updated via a trigger for each insertion into your email contact table. This PivotTable must have an email address and a count column. Each insert in the contact table updates the counter. Specify the index in the count column in the pivot table. Then you can request directly from THAT, have an email account, THEN to get the rest of any details you need to pull.

+2
source share

All Articles