Recommendations for reporting statistics on PHP and MySQL for WebApp

background:

I "inherited" php webapp in my small company and after many years of grumbling, finally got to go to throw the spaghetti code and start again.

We want to register every action created in the system, for example:

  • user X scanned item Y
  • user X updated item Y
  • new element Y in the city Z

and then provide graphs of different resolutions (day, month, year) of actions performed in the system.

In the previous version, we have a table with 20,000,000 records from 2005, so this will give you an idea of ​​the amount of data that we already have and this is only for one of many statistics.

actual question:

What recommendations do you plan to build near real-time system to create these statistics?

Notes:

  • Graphic display is already covered by google api render
  • I am not opposed to using any NoSql databases or messaging servers, crowns or something that gets the work done, but would prefer a mysql / php solution
  • My current train of thought is to automatically create a table for each statistic that I want to keep, and create several aggregation tables (by month, day, and year) to cache the results.
  • I know this is a broad question, but any suggestions are welcome.
+4
source share
3 answers

If all users should be registered, I would go with a full normalized solution.

USERS TABLE OBJECTS TABLE --------------- ----------------- user_id (primary) object_id (primary) USERS_TO_OBJECTS TABLE -------------------- user_id (index) object_id (index) time (index) action (index) ? object_type (index) // could be useful to speed things up 

This setting is likely to give you maximum flexibility in scheduling, and will also be pretty quick, as you can leave the user or object if you don't need them.

Edit:

Tell me, city X (id 9876) was updated by user 123 (id 1234) ...

 1234 - user_id (the user that did the action) 9876 - object_id (the object where the action was done) xyz - time updated - action type (so that you select only specific actions) city - object type (so that you select only specific objects) 

I downloaded this table with 40M rows and the results are pretty acceptable.

0.002 secs for a simple COUNT by the number of UPDATED cities in the last WEEK. The data was inserted randomly.

Edit 2

If you come across a really huge table, you can resort to MySQL partitions and your schema is beautiful. I really don't know how you are going to use tables, but you could:

RANGE BY RANGE. Organize the section by dates. Every new month or so you will have a new section.

RESOLUTION KEY. Organize the action section. Each action goes to the corresponding section.

You can check more on the partitions on the MySQL website and this article gives you some details on the fine-tuning partitions.

+2
source

You may consider a β€œdatabase” such as Redis. Redis uses linked lists to store list types, so you can easily just use LPUSH for such a list:

 $action = array( "type"=>"page_view", "url"=>"/example/path", "user"=>"user1", "timestamp"=>$_SERVER["REQUEST_TIME"] ); $r = new Redis(); // Redis connection info // $r->lPush("global_history", json_encode($action)); 

The LPUSH operation works under O(1) , so this should not be a problem. Getting the results can be a little more complicated ( O(n) time for lists). Redis is also heavily memory-based (although developers support "virtual memory" support), so it works fast, but you can easily run out of the room.

I use this technique to record history and statistics for the music site that I am launching. He is very successful and provides very fast results with very little effort.

If you want to make it more robust, you can use Hadoop or other technology to pull from the end of the list (RPOP) and archive the results in a more permanent format (for example, XML that has been ported to Amazon S3). You could then visualize your Google data from your archives (which would be static, compiled data) while viewing older data and Redis foreground for more recent data.

Hope this helps!

+1
source

I'm glad you finally got the green color to redesign this project. This is probably too late for an answer, but implementing MongoDB (with its Capped Collections function) would do such a job better. This is not too time consuming to implement, so there may be a chance.

In any case, I hope that everything will be fine.

0
source

All Articles