I am a lone developer for a telecommunications company, and after some tips on developing a database from someone who has some time to respond.
I insert ~ 2 million rows every day into one table, these tables are then archived and compressed monthly. Each monthly table contains ~ 15,000,000 rows. Although it is increasing every month.
For each insert that I do above, I combine the data from the rows that belong to each other and creates another βcorrelatedβ table. This table is not currently archived, as I need to make sure that I never miss an update in a correlated table. (I hope that makes sense) Although in general this information should remain fairly static after several days of processing.
All of the above works fine. However, my company now wants to run some statistics against this data, and these tables are getting too large to provide results in what would be considered reasonable time. Even with the appropriate indexes.
So, I think, after all of the above, my question is pretty simple. Should I write a script that groups data from my correlated table into smaller tables. Or should I store the query result set in something like memcache? I already use mysqls cache, but due to limited control over how long the data is stored, it does not work perfectly.
The main advantages that I can see in using memcache:
- Does not block my correlated table after the query has been cashed.
- Greater flexibility for sharing collected data between the backend collector and the front processor. (i.e., user reports can be written to the backend and the results of their storage in the cache under the key, which is then shared with anyone who would like to see the data of this report).
- Redundancy and scalability if we start sharing this data with a large number of customers.
The main disadvantages I can see are using something like memcache:
- Data is not saved if the machine is rebooted / the cache is cleared.
Key Benefits of Using MySql
- Permanent data.
- Less code changes (although adding something like memcache is trivial anyway)
Key disadvantages of using MySql
- You need to define table templates every time I want to save a new set of grouped data.
- You must write a program that iterates over the correlated data and populates these new tables.
- Potentially, it will still grow more slowly as the data fills.
I apologize for the rather long question. In any case, it helped me to write down these thoughts, and any advice / help / experience with similar problems would be greatly appreciated.
Many thanks.
Alan
sql mysql caching memcached
Alan hollis
source share