Mysql view with group performance issue

I have a table that collects data for web page performance. There are several machines that test multiple sites in 10 minutes, so I currently have about 700,000 lines (920 MB) with +/- 50,000 new lines per day.

Table Source:

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO"; CREATE TABLE `http_perf_raw_log` ( `run_dt` int(11) DEFAULT NULL, `dataset` varchar(64) DEFAULT NULL, `runner` varchar(64) DEFAULT NULL, `site` varchar(128) DEFAULT NULL, `machine` varchar(32) DEFAULT NULL, `called_url` varchar(1024) DEFAULT NULL, `method` varchar(8) DEFAULT NULL, `url` varchar(1024) DEFAULT NULL, `content_type` varchar(64) DEFAULT NULL, `http_code` int(11) DEFAULT NULL, `header_size` int(11) DEFAULT NULL, `request_size` int(11) DEFAULT NULL, `filetime` int(11) DEFAULT NULL, `ssl_verify_result` int(11) DEFAULT NULL, `redirect_count` int(11) DEFAULT NULL, `total_time` decimal(6,4) DEFAULT NULL, `namelookup_time` decimal(6,4) DEFAULT NULL, `connect_time` decimal(6,4) DEFAULT NULL, `pretransfer_time` decimal(6,4) DEFAULT NULL, `starttransfer_time` decimal(6,4) DEFAULT NULL, `redirect_time` decimal(6,4) DEFAULT NULL, `size_upload` int(11) DEFAULT NULL, `size_download` int(11) DEFAULT NULL, `speed_download` int(11) DEFAULT NULL, `speed_upload` int(11) DEFAULT NULL, `download_content_length` int(11) DEFAULT NULL, `upload_content_length` int(11) DEFAULT NULL, `certinfo` varchar(1024) DEFAULT NULL, `request_header` varchar(1024) DEFAULT NULL, `return_content` varchar(4096) DEFAULT NULL, `return_headers` varchar(2048) DEFAULT NULL, KEY `run_dt_idx` (`run_dt`), KEY `dataset_idx` (`dataset`), KEY `runner_idx` (`runner`), KEY `site_idx` (`site`), KEY `machine_idx` (`machine`), KEY `total_time_idx` (`total_time`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; 

To aggregate statistics (with a resolution of 1 hour) I created a view:

 CREATE OR REPLACE VIEW http_perf_stats (dataset, runner, site, machine, day, hour, calls, total_time, namelookup_time, connect_time, pretransfer_time, starttransfer_time, size_download) AS SELECT dataset, runner, site, machine, DATE_FORMAT(run_dt, '%Y-%m-%d') AS day, DATE_FORMAT(run_dt, '%k') AS hour, COUNT(*) AS calls, SUM(total_time), SUM(namelookup_time), SUM(connect_time), SUM(pretransfer_time), SUM(starttransfer_time), SUM(size_download) FROM http_perf_raw_log GROUP BY runner, site, machine, day, hour ORDER BY `day` DESC 

But the performance of VIEW (and the underlying SELECT) is terrible - it takes about 4 seconds.

So my questions are:

1. Uses GROUP BY in a good idea? And if not, then which alternative is better?

2. Is there (I think, yes, I am not an SQL expert: /) a way to optimize this SELECT (changing the request or the http_perf_raw_log structure)?

+4
source share
3 answers

In this case, it might be a good idea to only create statistics periodically (once per hour, for example).

I would do it as follows. Run the following code once to create the table structure.

 CREATE TABLE http_perf_stats AS SELECT dataset, runner, site, machine, DATE_FORMAT(run_dt, '%Y-%m-%d') AS day, DATE_FORMAT(run_dt, '%k') AS hour, COUNT(*) AS calls, SUM(total_time), SUM(namelookup_time), SUM(connect_time), SUM(pretransfer_time), SUM(starttransfer_time), SUM(size_download) FROM http_perf_raw_log GROUP BY runner, site, machine, day, hour ORDER BY `day` DESC 

Make some changes, such as changing field types, default values, adding a primary key, and possibly adding some indexes, so you can quickly access and query this table.

From now on, update the table as follows:

 START TRANSACTION; DELETE FROM http_perf_stats; INSERT INTO TABLE SELECT dataset, runner, site, machine, DATE_FORMAT(run_dt, '%Y-%m-%d') AS day, DATE_FORMAT(run_dt, '%k') AS hour, COUNT(*) AS calls, SUM(total_time), SUM(namelookup_time), SUM(connect_time), SUM(pretransfer_time), SUM(starttransfer_time), SUM(size_download) FROM http_perf_raw_log GROUP BY runner, site, machine, day, hour ORDER BY `day` DESC; COMMIT; 

A few ways to do this:

  • Create a MySQL event (see http://dev.mysql.com/doc/refman/5.1/en/create-event.html ) (what would I do)

  • Create a cron job (systems with a unique taste) or a window scheduler task

  • Make a lazy update. When someone asks for this list, run the code above, if it was last run, more than x minutes / hours ago. Thus, it looks more like a cache. Slowly on first request, fast after. But you will not slow down the server if someone is not interested in this.

+2
source

Remove GROUP BY from VIEW and use it in SELECT , which calls VIEW .

+1
source

A view is another SELECT query, but distracted to simplify a query for a set of results. If the underlying SELECT is slow, then the view. Reading and summing 1 GB of data in four seconds does not sound slower for me at all.

0
source

All Articles