Optimize Mysql query with time zone conversion and group by hour

This is my spreadsheet in MySql 5.5 with 30 million records

CREATE TABLE `campaign_logs` (
  `domain` varchar(50) DEFAULT NULL,
  `campaign_id` varchar(50) DEFAULT NULL,
  `subscriber_id` varchar(50) DEFAULT NULL,
  `message` varchar(21000) DEFAULT NULL,
  `log_time` datetime DEFAULT NULL,
  `log_type` varchar(50) DEFAULT NULL,
  `level` varchar(50) DEFAULT NULL,
  `campaign_name` varchar(500) DEFAULT NULL,
  KEY `subscriber_id_index` (`subscriber_id`),
  KEY `log_type_index` (`log_type`),
  KEY `log_time_index` (`log_time`),
  KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
  KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

In the next query, I am doing a group hour by hour with respect to the time zone

QUERY

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_OPENED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date

UNION ALL

SELECT
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_SENT' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_CLICKED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date;

results

The above query will produce results similar to this

+---------------+-------+----------------+-------------+
| EMAIL_CLICKED | 1 AM  |             71 |          83 |
| EMAIL_CLICKED | 1 PM  |             25 |          27 |
| EMAIL_SENT    | 10 AM |             51 |          59 |
| EMAIL_OPENED  | 10 PM |             16 |          18 |

This is an explanation of the above query

EXPLAIN

+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| id | select_type  | table         | type  | possible_keys                             | key                                       | key_len | ref  | rows   | Extra                                    |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
|  1 | PRIMARY      | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |  55074 | Using where; Using index; Using filesort |
|  2 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL | 330578 | Using where; Using index; Using filesort |
|  3 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |   1589 | Using where; Using index; Using filesort |
|NULL| UNION RESULT | <union1,2,3>  | ALL   | NULL                                      | NULL                                      | NULL    | NULL |   NULL |                                          |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+

OPTIMIZATION?

We have a coverage index on this table.

This request takes a lot of time (more than 1 minute).

If I remove distinct_count(subscriber_id)from the query, we will get the results in 1.5 seconds, but I need distinct_countfrom subscriber_idthe query.

Is there any way to optimize this query?

thank

+4
source share
2 answers

, group by 40 - , .

( log_type):

SELECT log_type,
       DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time,
       count(DISTINCT subscriber_id) AS distinct_count,
       count(subscriber_id) AS total_count
FROM stats.campaign_logs
WHERE DOMAIN = 'xxxx' AND
      campaign_id='1234' AND
      log_type = 'EMAIL_SENT' AND
      log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30')
GROUP BY time;

. , union all . , union all , OR/IN - .

+3
SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type IN ('EMAIL_OPENED','EMAIL_SENT','EMAIL_CLICKED')
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date, log_type

, ?

-1

All Articles