I am creating a front-end for a large db (10 million lines). The data is the use of water for cargo of different companies, and the table looks something like this:
id | company_id | datetime | reading | used | cost ============================================================= 1 | 1 | 2012-01-01 00:00:00 | 5000 | 5 | 0.50 2 | 1 | 2012-01-01 00:01:00 | 5015 | 15 | 1.50 ....
In the interface, users can choose how they want to view data, for example: 6 hourly increments, daily increments, monthly, etc. What would be the best way to do this quickly. Given such large changes in data and the number of views of one data set, caching query data in memcahce or something similar is almost pointless, and there is no way to collect data earlier, because there are too many variables.
I suggested that using some kind of aggregation table agregate would work with tables like readings , readings_6h , readings_1d with exactly the same structure, just grouped.
If this is a viable solution, the best way to keep pivot tables to date and accuracy. In addition to the data coming from meters, the table is read only. Users never need to update or write.
Possible solutions include:
1) stick to queries with group / aggregate functions on the fly
2) make the main choice and save
SELECT `company_id`, CONCAT_WS(' ', date(`datetime`), '23:59:59') AS datetime, MAX(`reading`) AS reading, SUM(`used`) AS used, SUM(`cost`) AS cost FROM `readings` WHERE `datetime` > '$lastUpdateDateTime' GROUP BY `company_id`
3) re-updating the key (do not know how aggregation will be performed here, also make sure that the data is accurate, not counted twice or missing lines.
INSERT INTO `readings_6h` ... SELECT FROM `readings` .... ON DUPLICATE KEY UPDATE .. calculate...
4) other ideas / recommendations?
I am currently doing option 2, which takes about 15 minutes to fill in rows + - 100k in + - 30k rows in 4 tables (_6h, _1d, _7d, _1m, _1y)
TL; DR. What is the best way to view / store aggregated data for multiple reports that cannot be cached efficiently.