We used an external log analyzer for a client project (large private intranet). Architecture:
- The js library adds a "web error", an empty gif with additional request parameters, downloaded from a dedicated nginx server.
- The log handler selects nginx logs, rotates them, and parses the rows in the database, counting access along with additional metadata. Entries in db include the UID of the content, among other interesting angles.
- The site has read-only access to the same database in order to make statistics requests.
Then the number of pages is easy, just query the database for the correct UID. Ranked lists are not much more complicated; request statistics, then use the UID to attach the catalog data to the result set.
The biggest problem we are facing right now is the lack of know-how of data warehousing (turning individual access rows into a database into efficient aggregates), and we are studying the redefinition of this setting to use Piwik as a statistics mechanism.
We cannot use Google Analytics in this particular case, but if you do not have such a restriction, I would advise you to look into collective.googleanalytics and see if you can do this in accordance with your use case.
source share