How to write an effective hit counter for sites

I want to write a hit counter script to track images on images on a website and outgoing IP addresses. Impressions exceed hundreds of thousands per day, so the counters will increase many times per second.

I am looking for a simple, standalone method (php, python scripts, etc.). I was thinking about using MySQL to track this, but I guess there is a more efficient way. What are good methods for storing counters?

+6
python php mysql tracking
source share
9 answers

A fascinating story. Increasing the counter, however it may be, just has to be a transaction ... which means that it can lock the entire database for longer than it makes sense! -) It could just be a bottleneck for the whole system.

If you need strictly accurate calculations, but don't need to be updated instantly, my favorite approach is to add counting information to the log (switching logs as often as necessary for data glowing purposes). When the log is closed (with thousands of counting events in it), the script can read it and update everything that is needed in one transaction, perhaps not intuitively, but much faster than thousands of single locks.

Then there are very fast counters that are only statistically accurate, but since you are not saying that such an inaccuracy is acceptable, I will not explain them in more detail.

+7
source share

You can take your web server Access Log (Apache: access.log) and evaluate it again and again (cronjob) if you do not need to have the data at hand when someone visits your site.

As a rule, access.log is created in any case and contains the requested resource, as well as the time, date and IP address of the user. This way you do not need to trace all traffic through php-script. Left, middle calculating machine.

+4
source share

There are two very simple ways:

  • Parse it from your weblogs in batch mode.
  • Run hits via beanstalkd or gearmand and let the employee make hard material in a controlled way.

Option 1 works with ready-made tools. Option 2 requires only a bit of programming, but gives you something closer to real-time updates, without making you fall when traffic jumps (for example, you will find mysql in your direct case).

+2
source share

No doubt Redis is perfect for this problem. It takes about a minute to install and install, supports atomic increments, incredibly fast, has client libraries for python and php (and many other languages), is durable (snapshots, log, replication).

Store each counter in its own key. Then just

INCR key 
+2
source share

If accuracy is important, you can do it a bit slower with MySql ... create a HEAP / Memory table to store your counter values. These tables are in memory that are incredibly fast. You can write data to a regular table at intervals.

Based on the ideas of the application engine, you can use memcache as a temporary storage for your counter. Increasing the memcache counter is faster than using the MySql heap tables (I think). Every five or ten seconds you can read the memcache counter and write this number to your database.

+1
source share

Not sure if this is in your lane, but AppEngine is a pretty good development platform. Sample code that you can use to create a counter using their DataStore and transactions is described here: http://code.google.com/appengine/docs/python/datastore/transactions.html .

0
source share

You can use Redis - this is a very fast keystore with support for atomic increments. If the need arises, data counting can be easily divided between several servers.

0
source share

I did something very similar, on a similar scale (several servers, hundreds of domains, several thousand hits per hour), and the analysis of the log files was definitely for you. (He also checked the number of hits, weighted them by file type and blacklisted IP addresses on the firewall, if they made too many requests, his goal was to automatically block bad bots, not just a counter, but counting was an essential part of it.)

The impact of performance on the web server process itself does not affect performance, since it does not do any additional work there, and you can easily publish periodically updated hit counts by inserting them into the site database every minute / 5 minutes / 100 beats / regardless to lock the corresponding row / table / database (depending on the locking mechanism used) with each hit.

0
source share

Well, if you manage to go the PHP route, you can use the SQLite database, however MySQL is a very reasonable way to store this information and usually (at least from those that I saw) is how this is done.

If you do not want to store the IP address and any other information a simple number in a text file may work.

-one
source share

All Articles