You actually have two problems that need to be addressed further down the road.
One of them, which you have yet to launch, but which you can earlier than you want, will embed the bandwidth in your statistics table.
The other that you indicated in your question actually uses statistics.
Start with login bandwidth.
First, if you do this, do not track statistics on pages that may use caching. Use a php script that advertises itself as empty javascript or as a single-pixel image and includes the latter in the pages you are tracking. This makes it easy to cache the remaining content of your site.
In the telecommunications business, instead of the actual inserts related to billing by phone calls, things are stored in memory and periodically synchronized with the disk. This allows you to manage gigantic bandwidth while maintaining hard drives.
To continue similarly on your part, you will need an atomic operation and some storage in memory. Here are a few memcache-based pseudo-codes to execute the first part ...
For each page, you need a Memcache variable. In Memcache, increment () is atomic, but add (), set (), etc. No. Therefore, you need to be careful not to miss counts when simultaneous processes simultaneously add the same page:
$ns = $memcache->get('stats-namespace'); while (!$memcache->increment("stats-$ns-$page_id")) { $memcache->add("stats-$ns-$page_id", 0, 1800); // garbage collect in 30 minutes $db->upsert('needs_stats_refresh', array($ns, $page_id)); // engine = memory }
Say every 5 minutes periodically (adjust the timeout accordingly), you will want to synchronize all this with the database without any possibility of simultaneous processes affecting each other or present. To do this, you increase the namespace before doing anything (this gives you a lock of existing data for all purposes and tasks) and is a little sleepy to handle existing processes that refer to the previous namespace if necessary:
$ns = $memcache->get('stats-namespace'); $memcache->increment('stats-namespace'); sleep(60);
Once this is done, you can safely scroll through the page identifiers, update statistics accordingly and clear the needs_stats_refresh table. The latter only needs two fields: page_id int pkey, ns_id int). There's a little more than just selecting, inserting, updating, and deleting statements that run from your scripts, however, continuing ...
As another respondent said, it’s quite appropriate to maintain intermediate statistics for your purpose: store hits, not individual hits. In the best case scenario, I assume that you want hourly statistics or quarterly statistics, so it’s normal to deal with subtotals that load every 15 minutes.
Even more important for you, since you order messages using these totals, you want to keep aggregated totals and have an index on the latter. (We will get to where further.)
One way to maintain totals is to add a trigger that, when inserted or updated into the statistics table, will adjust the overall statistics if necessary.
Be especially careful about dead locks. While no two runs of $ns will mix their respective statistics, there is still a (subtle) chance that two or more processes will start the increment $ ns step described above and subsequently issue statements that seek update is counted simultaneously. Obtaining advisory locks is the easiest, safest, and fastest way to avoid problems associated with this.
Assuming you are using a control lock, it’s quite normal to use: total = total + subtotal in updating the statement.
While on the topic of locks, please note that updating totals will require an exclusive lock for each affected row. Since you order them, you do not want all of them to be processed at a time, because this may mean maintaining an exclusive lock for an extended duration. The simplest thing here is to process the inserts in the statistics in smaller batches (say 1000), followed by a commit.
For intermediate statistics (monthly, weekly) add a few boolean fields (bit or tinyint in MySQL) to the statistics table. Ask each of them to save whether they will be taken into account with monthly, weekly, daily statistics, etc. Place a trigger on them so that they increase or decrease the applicable totals in the stat_totals table.
As a final note, give some thoughts on where you want to keep the actual score. This should be an indexed field, and the latter will be greatly updated. Typically, you want it to be stored in its own table, not in the page table, to avoid cluttering your page table with (much larger) dead lines.
Assuming you have done all of the above, your final request would look like this:
select p.* from pages p join stat_totals s using (page_id) order by s.weekly_total desc limit 10
It should be fast enough with the index weekly.
Finally, do not forget that the most obvious is: if you run these general / monthly / weekly / etc requests over and over again, their result should also be placed in memcache.