I was assigned the task of creating some graphic characteristics for a website from some stored data.
Facts: - 3 databases are used. dbCurrent, dbStats, dbBackup. dbCurrent - The main database of the dbStats website contains various statistics tables and tracking data. dbBackup stores the last five years of statistics / tracking tables.
- the data that I will use must come from two databases (dbStats, dbBackup)
- table names: stats2006, stats2007, stats2008, etc., with the exception of current statistics, which are only "statistics". Each table has data for the year.
- The table structure for each year of data is the same: primaryID - integer field ProductID is an integer field DateMonitor is an integer (unixtimestamp) pageName - varchar (20)
- productID, dateMonitor, pageName fields also have indexes
In other words, which product was viewed on which day and from which page.
So, I thought I was creating a loop from each table and getting my data. Each of my requests is as follows:
Select COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions
where $ tablename and $ conditions are variables based on each loop. All conditions are similar:
- dateMonitor between date1 and date2
- pageName = 'some val'
- productID IN ($ comma_separated_values)
- combination of the above
They all work decently so far (for a single product).
When I try to create a report to compare products 'x' to 'y' years (dynamically selected from admin / moderator), the script runs for more than 15 minutes.
I am looking for a way to improve script performance. The logic / structure that I use so far follows:
Loop through products to find the ids to use (typical format is: x,y,z (comma separated values) Open Loop through years/months Execute one sql query for each affected table/database to get the number of affected rows. Close year loop Send data to graph script (jquery jqPlot to be exact) to print on screen
Any help / idea appreciated.
EDIT: Based on @Narf's suggestion with UNION ALL, I built 1 single query based on 12 subselect statements:
SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='1167606001' AND dateMonitor<='1170284399' AND dateMonitor='test' UNION ALL SELECT COUNT(*) AS monthlyTotal FROM db1.table2 WHERE dateMonitor>='1170284401' AND dateMonitor<='1172703599' AND dateMonitor='test' ...
Each select statement refers to a duration of one month. Demo code:
for ($m=1; $m<=12; $m++) { $startDate = mktime(0, 0, 1, $m, 1, $myYear); $daysOfMonth = date("t", mktime(10, 10, 10, $m, 10, $myYear)); $endDate = mktime(23, 59, 59, $m, $daysOfMonth, $myYear); $query_chk1 .= "SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='$startDate' AND dateMonitor<='$endDate' AND pageName='test' UNION ALL "; } $query_chk1 = substr($query_chk1, 0, -10);
EDIT2: after creating combined indexes (as suggested by @ypercube), I see some slight reduction in runtime.
Now I have an average runtime of 11 minutes (the original time was 15-17 minutes)
This has helped to significantly reduce runtime.
Thanks.