First of all, before you do anything, the Data Warehouse Toolkit .
You are doing data warehouse work, you need to solve it as a data warehouse job. You will need to read the correct design patterns for this kind of thing.
[Note Data Warehouse does not mean insane big or expensive or complex. This means Star Schema and smart ways to handle large amounts of data that are never updated.]
SQL databases are slow, but slow work is good for flexible searches.
The file system is fast. This is a terrible thing to update, but you are not updating, you are just accumulating.
A typical DW approach for this is to do this.
Define a "star pattern" for your data. The measurable facts and attributes (“dimensions”) of these facts. Your fact looks like the number of bytes. Everything else (address, timestamp, user ID, etc.) is an aspect of this fact.
Create dimensional data in the base measurement database. It is relatively small (IP addresses, users, date measurement, etc.). Each dimension will have all the attributes you will ever want to know. It grows, people always add attributes to dimensions.
Create a boot process that takes your logs, decides the sizes (times, addresses, users, etc.) and combines measurement keys with measures (number of bytes). This can update the dimension to add a new user or a new address. Typically, you read fact lines, do a search, and write fact lines that have all the relevant FKs associated with them.
Save these boot files to disk. These files are not updated. They just accumulate. Use simple notation like CSV so you can easily download them.
When someone wants to do an analysis, create a datamart for them.
For the selected IP address or time interval or any other, you will get all the relevant facts, as well as the main measurement data and volumetric datamart associated with them.
You can execute all the SQL queries that you want on this server. Most queries are passed to SELECT COUNT(*) and SELECT SUM(*) with various GROUP BY and HAVING and WHERE HAVING .
S. Lott
source share