Mysql php optimization

I was assigned the task of creating some graphic characteristics for a website from some stored data.

Facts: - 3 databases are used. dbCurrent, dbStats, dbBackup. dbCurrent - The main database of the dbStats website contains various statistics tables and tracking data. dbBackup stores the last five years of statistics / tracking tables.

  • the data that I will use must come from two databases (dbStats, dbBackup)
  • table names: stats2006, stats2007, stats2008, etc., with the exception of current statistics, which are only "statistics". Each table has data for the year.
  • The table structure for each year of data is the same: primaryID - integer field ProductID is an integer field DateMonitor is an integer (unixtimestamp) pageName - varchar (20)
  • productID, dateMonitor, pageName fields also have indexes

In other words, which product was viewed on which day and from which page.

So, I thought I was creating a loop from each table and getting my data. Each of my requests is as follows:

Select COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions 

where $ tablename and $ conditions are variables based on each loop. All conditions are similar:

  • dateMonitor between date1 and date2
  • pageName = 'some val'
  • productID IN ($ comma_separated_values)
  • combination of the above

They all work decently so far (for a single product).

When I try to create a report to compare products 'x' to 'y' years (dynamically selected from admin / moderator), the script runs for more than 15 minutes.

I am looking for a way to improve script performance. The logic / structure that I use so far follows:

 Loop through products to find the ids to use (typical format is: x,y,z (comma separated values) Open Loop through years/months Execute one sql query for each affected table/database to get the number of affected rows. Close year loop Send data to graph script (jquery jqPlot to be exact) to print on screen 

Any help / idea appreciated.

EDIT: Based on @Narf's suggestion with UNION ALL, I built 1 single query based on 12 subselect statements:

 SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='1167606001' AND dateMonitor<='1170284399' AND dateMonitor='test' UNION ALL SELECT COUNT(*) AS monthlyTotal FROM db1.table2 WHERE dateMonitor>='1170284401' AND dateMonitor<='1172703599' AND dateMonitor='test' ... 

Each select statement refers to a duration of one month. Demo code:

 for ($m=1; $m<=12; $m++) { $startDate = mktime(0, 0, 1, $m, 1, $myYear); $daysOfMonth = date("t", mktime(10, 10, 10, $m, 10, $myYear)); $endDate = mktime(23, 59, 59, $m, $daysOfMonth, $myYear); $query_chk1 .= "SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='$startDate' AND dateMonitor<='$endDate' AND pageName='test' UNION ALL "; } $query_chk1 = substr($query_chk1, 0, -10); 

EDIT2: after creating combined indexes (as suggested by @ypercube), I see some slight reduction in runtime.

Now I have an average runtime of 11 minutes (the original time was 15-17 minutes)

This has helped to significantly reduce runtime.

Thanks.

+4
source share
2 answers

Not much that you can do, at least since you have specified all your columns ... here is the best I can think of:

 SELECT COUNT(*) FROM `stats` WHERE `productID IN(1,2,3) AND `dateMonitor` >= <unixtime from> AND `dateMonitor` <= <unixtime to> AND `pageName`='<value>' 

... And How:

  • As ypercube commented, using COUNT(*) is faster.
  • I don't know this for sure, but I believe that using >= and <= instead of BETWEEN for integers should be faster.

Another thing you should try is to complete all the queries (if more than one) at once. It would be harder for me to correctly explain this in words, and I see that you have a good understanding of SQL, so you should be able to get the logic, here is an example:

Let's say we need to look for products with identifiers 123, 13, 5, and 6 from May 2006 to April 2008 and pageName 'test':

  • We calculate timestamps before creating a query and determine exactly which tables to look for.

    SELECT COUNT (*) AS myCounter FROM stats2006 WHERE productID IN (5,6,13,123) AND dateMonitor > = 1146430800 AND pageName = 'test'

    / * Here we only need to check the timestamp from May 1, 2006, 00:00:00 * /

    UNION ALL

    SELECT COUNT (*) AS myCounter FROM stats2007 WHERE productID IN (5,6,13,123) AND pageName = 'test'

    / * Here we do not need to check the dateMonitor field because the whole year corresponds to our period * /

    UNION ALL

    SELECT COUNT (*) AS myCounter FROM stats2008 WHERE productID IN (5,6,13,123) AND dateMonitor <= 1209589199 AND pageName = 'test'

    / * Here we only need to check the time stamp of April 30, 2008, 23:59:59 * /

+1
source

When you compare products 'x' to 'y', why aren't you using GROUP BY? For instance:

 Select productID, COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions GROUP BY productID 

this will reduce the number of requests and speed up the process.

0
source

All Articles