Aggregate data tables

Question

Aggregate data tables

I am creating a front-end for a large db (10 million lines). The data is the use of water for cargo of different companies, and the table looks something like this:

id | company_id | datetime | reading | used | cost ============================================================= 1 | 1 | 2012-01-01 00:00:00 | 5000 | 5 | 0.50 2 | 1 | 2012-01-01 00:01:00 | 5015 | 15 | 1.50 ....

In the interface, users can choose how they want to view data, for example: 6 hourly increments, daily increments, monthly, etc. What would be the best way to do this quickly. Given such large changes in data and the number of views of one data set, caching query data in memcahce or something similar is almost pointless, and there is no way to collect data earlier, because there are too many variables.

I suggested that using some kind of aggregation table agregate would work with tables like readings , readings_6h , readings_1d with exactly the same structure, just grouped.

If this is a viable solution, the best way to keep pivot tables to date and accuracy. In addition to the data coming from meters, the table is read only. Users never need to update or write.

Possible solutions include:

1) stick to queries with group / aggregate functions on the fly

2) make the main choice and save

 SELECT `company_id`, CONCAT_WS(' ', date(`datetime`), '23:59:59') AS datetime, MAX(`reading`) AS reading, SUM(`used`) AS used, SUM(`cost`) AS cost FROM `readings` WHERE `datetime` > '$lastUpdateDateTime' GROUP BY `company_id`

3) re-updating the key (do not know how aggregation will be performed here, also make sure that the data is accurate, not counted twice or missing lines.

 INSERT INTO `readings_6h` ... SELECT FROM `readings` .... ON DUPLICATE KEY UPDATE .. calculate...

4) other ideas / recommendations?

I am currently doing option 2, which takes about 15 minutes to fill in rows + - 100k in + - 30k rows in 4 tables (_6h, _1d, _7d, _1m, _1y)

TL; DR. What is the best way to view / store aggregated data for multiple reports that cannot be cached efficiently.

+8

mysql

dogmatic69 Jul 26 '12 at 17:35

source share

1 answer

Joni · Accepted Answer · 2012-07-26T19:04:53+0000

This functionality is best served by the materialized view function, which, unfortunately, MySQL lacks. You might consider moving to another database system such as PostgreSQL.

There are ways to emulate materialized representations in MySQL using stored procedures, triggers, and events. You create a stored procedure that updates aggregated data. If aggregated data needs to be updated for each insert, you can define a trigger to call the procedure. If the data needs to be updated every few hours, you can define a MySQL scheduler event or a cron job to do this.

There is a combined approach similar to your option 3, which does not depend on the dates of the input data; Imagine what happens if some new data arrives too late and does not turn into aggregation. (Perhaps you do not have this problem, I do not know.) You could define a trigger that inserts new data into the "backlog", and only update the aggregate table from behind.

All of these methods are described in detail in this article: http://www.fromdual.com/mysql-materialized-views

Aggregate data tables

More articles: