MySQL subquery with user variables

Question

MySQL subquery with user variables

I am trying to execute a query that requires a computed column using a subquery that passes a date reference through a variable. I am not sure that I am not “doing it right”, but, in fact, the request never ends and rotates for several minutes in a row. This is my request:

select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust, ( select count(distinct cev.customer_email) as num_prev from _pj_cust_email_view cev inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email) where cev.order_date > @groupdate ) as prev_cust_count from _pj_cust_email_view group by @groupdate;

The subquery has inner join performs a self-join, which gives me only the number of people who acquired up to date on @groupdate . EXPLAIN below:

 +----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+ | 1 | PRIMARY | _pj_cust_email_view | ALL | NULL | NULL | NULL | NULL | 140147 | Using temporary; Using filesort | | 2 | UNCACHEABLE SUBQUERY | cev | ALL | IDX_EMAIL | NULL | NULL | NULL | 140147 | Using where | | 2 | UNCACHEABLE SUBQUERY | prev_purch | ref | IDX_EMAIL | IDX_EMAIL | 768 | cart_A.cev.customer_email | 1 | Using where | +----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+

And the structure of the _pj_cust_email_view table is as follows:

 '_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` ( `order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL, `customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL, KEY `IDX_EMAIL` (`customer_email`), KEY `IDX_ORDERDATE` (`order_date`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1'

Again, as I said earlier, I'm not sure if this is the best way to achieve this. Any criticism, direction are welcome!

Update

I have moved forward a bit, and now I am doing the above procedure, repeating all the known months, not the months in the database, and setting vars in advance. I don't like it yet. This is what I have now:

Sets custom vars

 set @startdate:='2010-08', @enddate:='2010-09';

Gets the total number of emails in a given range

 select count(distinct customer_email) as num_cust from _pj_cust_email_view where order_date between @startdate and @enddate;

Gets the total number of customers who have purchased up to a given range.

 select count(distinct cev.customer_email) as num_prev from _pj_cust_email_view cev inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email) where cev.order_date between @startdate and @enddate;

Where @startdate set to the beginning of the month, and @enddate means the end of the range of that month.

It really seems to me that this can still be done in one complete request.

+8

mysql

philwinkle Feb 23 '11 at 16:09

source share

2 answers

Although Bill has a good query using multiple tables, he also does this with SQL variables, so no extra table. An internal query is attached to your _pj_cust_email_view table and a limit of 10 means that it is returned only after 10 months from the current month. So there is no hard coding of dates, it is calculated on the fly ... if you want more or less months, just change the LIMIT clause.

By setting @dt: = as the LAST field in the internal query, only THEN makes the base date assigned to the next recording cycle to create your qualification dates ...

 select justDates.FirstOfMonth, count( distinct EMCurr.customer_Email ) UniqThisMonth, count( distinct EMLast.customer_Email ) RepeatCustomers from ( SELECT @dt FirstOfMonth, last_day( @dt ) EndOfMonth, @dt:= date_sub(@dt, interval 1 month) nextCycle FROM (select @dt := date_sub( current_date(), interval dayofmonth( current_date())-1 day )) vars, _pj_cust_email_view limit 10 ) JustDates join _pj_cust_email_view EMCurr on EMCurr.order_Date between JustDates.FirstOfMonth and JustDates.EndOfMonth left join _pj_cust_email_view EMLast on EMLast.order_Date < JustDates.FirstOfMonth and EMCurr.customer_Email = EMLast.customer_Email group by 1

0

DRapp Mar 2 '11 at 1:35

source share

Bill karwin · Accepted Answer · 2011-03-01T23:24:45+0000

I don’t think you need to use subqueries at all, and you don’t have to go through the months.

Instead, I recommend that you create a table to store all months. Even if you sprinkle it with 100-year months, it will only have 1,200 lines, which is trivial.

 CREATE TABLE Months ( start_date DATE, end_date DATE, PRIMARY KEY (start_date, end_date) ); INSERT INTO Months (start_date, end_date) VALUES ('2011-03-01', '2011-03-31');

Keep the actual start and end dates, so you can use the DATE data type and index two columns correctly.

edit: I think I understand your requirement a little better, and I cleared this answer. The following query may be right for you:

 SELECT DATE_FORMAT(m.start_date, '%Y-%m') AS month, COUNT(DISTINCT cev.customer_email) AS current, GROUP_CONCAT(DISTINCT cev.customer_email) AS current_email, COUNT(DISTINCT prev.customer_email) AS earlier, GROUP_CONCAT(DISTINCT prev.customer_email) AS earlier_email FROM Months AS m LEFT OUTER JOIN _pj_cust_email_view AS cev ON cev.order_date BETWEEN m.start_date AND m.end_date INNER JOIN Months AS mprev ON mprev.start_date <= m.start_date LEFT OUTER JOIN _pj_cust_email_view AS prev ON prev.order_date BETWEEN mprev.start_date AND mprev.end_date GROUP BY month;

If you create the following composite index in your table:

 CREATE INDEX order_email on _pj_cust_email_view (order_date, customer_email);

Then the query has the best chance of being a query only for the index and will work much faster.

Below is an EXPLAIN optimization report from this query. Note type: index for each table.

 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: m type: index possible_keys: PRIMARY key: PRIMARY key_len: 6 ref: NULL rows: 4 Extra: Using index; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: mprev type: index possible_keys: PRIMARY key: PRIMARY key_len: 6 ref: NULL rows: 4 Extra: Using where; Using index; Using join buffer *************************** 3. row *************************** id: 1 select_type: SIMPLE table: cev type: index possible_keys: order_email key: order_email key_len: 17 ref: NULL rows: 10 Extra: Using index *************************** 4. row *************************** id: 1 select_type: SIMPLE table: prev type: index possible_keys: order_email key: order_email key_len: 17 ref: NULL rows: 10 Extra: Using index

Here are some test data:

 INSERT INTO Months (start_date, end_date) VALUES ('2011-03-01', '2011-03-31'), ('2011-02-01', '2011-02-28'), ('2011-01-01', '2011-01-31'), ('2010-12-01', '2010-12-31'); INSERT INTO _pj_cust_email_view VALUES ('ron', '2011-03-10'), ('hermione', '2011-03-15'), ('hermione', '2011-02-15'), ('hermione', '2011-01-15'), ('hermione', '2010-12-15'), ('neville', '2011-01-10'), ('harry', '2011-03-19'), ('harry', '2011-02-10'), ('molly', '2011-03-25'), ('molly', '2011-01-10');

Here, the result provides data, including a combined list of letters, to make it easier to see.

 +---------+---------+--------------------------+---------+----------------------------------+ | month | current | current_email | earlier | earlier_email | +---------+---------+--------------------------+---------+----------------------------------+ | 2010-12 | 1 | hermione | 1 | hermione | | 2011-01 | 3 | neville,hermione,molly | 3 | hermione,molly,neville | | 2011-02 | 2 | hermione,harry | 4 | harry,hermione,molly,neville | | 2011-03 | 4 | molly,ron,harry,hermione | 5 | molly,ron,hermione,neville,harry | +---------+---------+--------------------------+---------+----------------------------------+

MySQL subquery with user variables

More articles: