DataTables: speed up server side processing with multiple tables, grouping and html content?

I use datatables to display data from multiple mySQL tables (actually 7 of them). Currently, there is not much data, but I see "Displaying 1 to 7 of 7 records (filtered out of 642,660,480 total records)." and it takes 20 seconds to display a total of 7 entries. As soon as I really start adding a lot of content to the database, I am sure that it will not be suitable.

I'm sure there are better ways to accomplish what I'm trying to do, but this is the only way I was able to get it to work.

Here is my server side script:

$q1 = "'"; $q2 = '"'; $order_id = "CONCAT( ".$q2."<input type='hidden' id='order_id' value='".$q2.", o.id, ".$q2."'><a href='order_details.php?id=".$q2.", o.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", o.id )"; $patient_name = "CONCAT( ".$q2."<input type='hidden' id='patient_name' value='".$q2.", p.first_name, ' ', p.last_name, ".$q2."'><input type='hidden' id='patient_id' value='".$q2.", p.id, ".$q2."'><input type='hidden' id='patient_ssn' value='".$q2.", p.ssn, ".$q2."'><a href='patient_details.php?id=".$q2.", p.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", p.first_name, ' ', p.last_name )"; $doc_name = "CONCAT( ".$q2."<input type='hidden' id='doctor_name' value='".$q2.", d.first_name, ' ', d.last_name, ".$q2."'><input type='hidden' id='doctor_id' value='".$q2.", d.id, ".$q2."'><a href='doctor_details.php?id=".$q2.", d.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", d.first_name, ' ', d.last_name )"; $order_date = "FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y')"; $tests = "GROUP_CONCAT(t.name SEPARATOR ', ')"; $aColumns = array($order_id, $order_date, $doc_name, $patient_name, $tests, 'o.status'); /* Indexed column (used for fast and accurate table cardinality) */ $sIndexColumn = "o.id"; /* DB table to use */ $sTable = "`orders` o, `patients` p, `doctors` d, `tests_ordered` tst, `tests` t, `users` u, `events` e"; $sWhere = "WHERE p.id = o.patient_id AND d.id = o.doctor_id AND tst.order_id = o.id AND t.id = tst.test_id AND u.username = o.assigned_username AND e.event_id = o.event_id"; $order_status = isset($_GET['status']) ? $_GET['status'] : 'all'; if($order_status == 'all'){ }else{ $sWhere .= " AND (o.status='Complete' OR o.status='$order_status')"; } $sGroupBy = "GROUP BY o.id"; /* Database connection information */ $gaSql['user'] = DB_USER; $gaSql['password'] = DB_PASSWORD; $gaSql['db'] = DB_NAME; $gaSql['server'] = DB_SERVER; /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * If you just want to use the basic configuration for DataTables with PHP server-side, there is * no need to edit below this line */ /* * MySQL connection */ $gaSql['link'] = mysql_pconnect( $gaSql['server'], $gaSql['user'], $gaSql['password'] ) or die( 'Could not open connection to server' ); mysql_select_db( $gaSql['db'], $gaSql['link'] ) or die( 'Could not select database '. $gaSql['db'] ); /* * Paging */ $sLimit = ""; if ( isset( $_GET['iDisplayStart'] ) && $_GET['iDisplayLength'] != '-1' ) { $sLimit = "LIMIT ".mysql_real_escape_string( $_GET['iDisplayStart'] ).", ". mysql_real_escape_string( $_GET['iDisplayLength'] ); } /* * Ordering */ if ( isset( $_GET['iSortCol_0'] ) ) { $sOrder = "ORDER BY "; for ( $i=0 ; $i<intval( $_GET['iSortingCols'] ) ; $i++ ) { if ( $_GET[ 'bSortable_'.intval($_GET['iSortCol_'.$i]) ] == "true" ) { $sOrder .= $aColumns[ intval( $_GET['iSortCol_'.$i] ) ]." ".mysql_real_escape_string( $_GET['sSortDir_'.$i] ) .", "; } } $sOrder = substr_replace( $sOrder, "", -2 ); if ( $sOrder == "ORDER BY" ) { $sOrder = ""; } } /* * Filtering * NOTE this does not match the built-in DataTables filtering which does it * word by word on any field. It possible to do here, but concerned about efficiency * on very large tables, and MySQL regex functionality is very limited */ if ( $_GET['sSearch'] != "" ) { $sWhere .= " AND ("; for ( $i=0 ; $i<count($aColumns) ; $i++ ) { if($i!=4){ //skip tests column $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string( $_GET['sSearch'] )."%' OR "; } } $sWhere = substr_replace( $sWhere, "", -3 ); $sWhere .= ')'; } /* Individual column filtering */ for ( $i=0 ; $i<count($aColumns) ; $i++ ) { if($i!=4){ //skip tests column if ( $_GET['bSearchable_'.$i] == "true" && $_GET['sSearch_'.$i] != '' ) { if ( $sWhere == "" ) { $sWhere = "WHERE "; } else { $sWhere .= " AND "; } $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string($_GET['sSearch_'.$i])."%' "; } } } /* * SQL queries * Get data to display */ $sQuery = " SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))." FROM $sTable $sWhere $sGroupBy $sOrder $sLimit "; //echo $sQuery; //die(); $rResult = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error()); /* Data set length after filtering */ $sQuery = " SELECT FOUND_ROWS() "; $rResultFilterTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error()); $aResultFilterTotal = mysql_fetch_array($rResultFilterTotal); $iFilteredTotal = $aResultFilterTotal[0]; /* Total data set length */ $sQuery = " SELECT COUNT(".$sIndexColumn.") FROM $sTable "; $rResultTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error()); $aResultTotal = mysql_fetch_array($rResultTotal); $iTotal = $aResultTotal[0]; //added to hide filtering //$iTotal = $iFilteredTotal; /* * Output */ $output = array( "sEcho" => intval($_GET['sEcho']), "iTotalRecords" => $iTotal, "iTotalDisplayRecords" => $iFilteredTotal, "aaData" => array() ); while ( $aRow = mysql_fetch_array( $rResult ) ) { $row = array(); for ( $i=0 ; $i<count($aColumns) ; $i++ ) { if ( $aColumns[$i] != ' ' ) { /* General output */ $row[] = $aRow[$i]; } } $output['aaData'][] = $row; } echo json_encode( $output ); 

Here is the request that the server side of the script generates:

 SELECT SQL_CALC_FOUND_ROWS CONCAT( "<input type='hidden' id='order_id' value='", o.id, "'><a href='order_details.php?id=", o.id, "'><img src='search.png' border='0'></a> &nbsp;", o.id ), FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y'), CONCAT( "<input type='hidden' id='doctor_name' value='", d.first_name, ' ', d.last_name, "'><input type='hidden' id='doctor_id' value='", d.id, "'><a href='doctor_details.php?id=", d.id, "'><img src='search.png' border='0'></a> &nbsp;", d.first_name, ' ', d.last_name ), CONCAT( "<input type='hidden' id='patient_name' value='", p.first_name, ' ', p.last_name, "'><input type='hidden' id='patient_id' value='", p.id, "'><input type='hidden' id='patient_ssn' value='", p.ssn, "'><a href='patient_details.php?id=", p.id, "'><img src='search.png' border='0'></a> &nbsp;", p.first_name, ' ', p.last_name ), GROUP_CONCAT(t.name SEPARATOR ', '), o.status FROM `orders` o, `patients` p, `doctors` d, `tests_ordered` tst, `tests` t, `users` u, `events` e WHERE p.id = o.patient_id AND d.id = o.doctor_id AND tst.order_id = o.id AND t.id = tst.test_id AND u.username = o.assigned_username AND e.event_id = o.event_id AND (o.status='Complete' OR o.status='Draft') GROUP BY o.id 

I am trying to determine what I can do to optimize this without interfering with the data search and data sorting functions. I created indexes and set primary keys for all tables as well as I know. Is it possible to use JOIN?

Here is the output of the EXPLAIN statement:

 id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE u index NULL PRIMARY 32 NULL 5 Using index; Using temporary; Using filesort 1 SIMPLE o ALL PRIMARY,patient_id,doctor_id,event_id,assigned_use... NULL NULL NULL 6 Using where 1 SIMPLE d eq_ref PRIMARY PRIMARY 4 pasdbadmin.o.doctor_id 1 1 SIMPLE e eq_ref PRIMARY PRIMARY 4 pasdbadmin.o.event_id 1 Using index 1 SIMPLE tst ref order_id,test_id order_id 4 pasdbadmin.o.id 1 1 SIMPLE t eq_ref PRIMARY PRIMARY 4 pasdbadmin.tst.test_id 1 1 SIMPLE p eq_ref PRIMARY PRIMARY 4 pasdbadmin.o.patient_id 1 

UPDATE:

The problem seems to have been related to usage, including tables of users and events in the request (none of them were actually used). Here is the corrected code that runs much faster:

 $q1 = "'"; $q2 = '"'; $order_id = "CONCAT( ".$q2."<input type='hidden' id='order_id' value='".$q2.", o.id, ".$q2."'><a href='order_details.php?id=".$q2.", o.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", o.id )"; $patient_name = "CONCAT( ".$q2."<input type='hidden' id='patient_name' value='".$q2.", p.first_name, ' ', p.last_name, ".$q2."'><input type='hidden' id='patient_id' value='".$q2.", p.id, ".$q2."'><input type='hidden' id='patient_ssn' value='".$q2.", p.ssn, ".$q2."'><a href='patient_details.php?id=".$q2.", p.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", p.first_name, ' ', p.last_name )"; $doc_name = "CONCAT( ".$q2."<input type='hidden' id='doctor_name' value='".$q2.", d.first_name, ' ', d.last_name, ".$q2."'><input type='hidden' id='doctor_id' value='".$q2.", d.id, ".$q2."'><a href='doctor_details.php?id=".$q2.", d.id, ".$q2."'><img src='https://pas.greysignal.com/img/search.png' border='0'></a> &nbsp;".$q2.", d.first_name, ' ', d.last_name )"; $order_date = "FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y')"; $tests = "GROUP_CONCAT(t.name SEPARATOR ', ')"; $aColumns = array($order_id, $order_date, $doc_name, $patient_name, $tests, 'o.status'); /* Indexed column (used for fast and accurate table cardinality) */ $sIndexColumn = "o.id"; /* DB table to use */ $sTable = "`orders` o, `patients` p, `doctors` d, `tests_ordered` tst, `tests` t"; $sWhere = "WHERE p.id = o.patient_id AND d.id = o.doctor_id AND tst.order_id = o.id AND t.id = tst.test_id"; $order_status = isset($_GET['status']) ? $_GET['status'] : 'all'; if($order_status == 'all'){ }else{ $sWhere .= " AND (o.status='Complete' OR o.status='$order_status')"; } $sJoin = ""; $sGroupBy = "GROUP BY o.id"; /* Database connection information */ $gaSql['user'] = DB_USER; $gaSql['password'] = DB_PASSWORD; $gaSql['db'] = DB_NAME; $gaSql['server'] = DB_SERVER; /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * If you just want to use the basic configuration for DataTables with PHP server-side, there is * no need to edit below this line */ /* * MySQL connection */ $gaSql['link'] = mysql_pconnect( $gaSql['server'], $gaSql['user'], $gaSql['password'] ) or die( 'Could not open connection to server' ); mysql_select_db( $gaSql['db'], $gaSql['link'] ) or die( 'Could not select database '. $gaSql['db'] ); /* * Paging */ $sLimit = ""; if ( isset( $_GET['iDisplayStart'] ) && $_GET['iDisplayLength'] != '-1' ) { $sLimit = "LIMIT ".mysql_real_escape_string( $_GET['iDisplayStart'] ).", ". mysql_real_escape_string( $_GET['iDisplayLength'] ); } /* * Ordering */ if ( isset( $_GET['iSortCol_0'] ) ) { $sOrder = "ORDER BY "; for ( $i=0 ; $i<intval( $_GET['iSortingCols'] ) ; $i++ ) { if ( $_GET[ 'bSortable_'.intval($_GET['iSortCol_'.$i]) ] == "true" ) { $sOrder .= $aColumns[ intval( $_GET['iSortCol_'.$i] ) ]." ".mysql_real_escape_string( $_GET['sSortDir_'.$i] ) .", "; } } $sOrder = substr_replace( $sOrder, "", -2 ); if ( $sOrder == "ORDER BY" ) { $sOrder = ""; } } /* * Filtering * NOTE this does not match the built-in DataTables filtering which does it * word by word on any field. It possible to do here, but concerned about efficiency * on very large tables, and MySQL regex functionality is very limited */ if ( $_GET['sSearch'] != "" ) { $sWhere .= " AND ("; for ( $i=0 ; $i<count($aColumns) ; $i++ ) { if($i!=4){ //skip tests column $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string( $_GET['sSearch'] )."%' OR "; } } $sWhere = substr_replace( $sWhere, "", -3 ); $sWhere .= ')'; } /* Individual column filtering */ for ( $i=0 ; $i<count($aColumns) ; $i++ ) { if($i!=4){ //skip tests column if ( $_GET['bSearchable_'.$i] == "true" && $_GET['sSearch_'.$i] != '' ) { if ( $sWhere == "" ) { $sWhere = "WHERE "; } else { $sWhere .= " AND "; } $sWhere .= $aColumns[$i]." LIKE '%".mysql_real_escape_string($_GET['sSearch_'.$i])."%' "; } } } /* * SQL queries * Get data to display */ $sQuery = " SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))." FROM $sTable $sWhere $sJoin $sGroupBy $sOrder $sLimit "; $filename = __DIR__.DIRECTORY_SEPARATOR."sql_log.txt"; file_put_contents($filename, $sQuery, FILE_APPEND); $rResult = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error()); /* Data set length after filtering */ $sQuery = " SELECT FOUND_ROWS() "; $rResultFilterTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error()); $aResultFilterTotal = mysql_fetch_array($rResultFilterTotal); $iFilteredTotal = $aResultFilterTotal[0]; /* Total data set length */ $sQuery = " SELECT COUNT(".$sIndexColumn.") FROM $sTable "; $rResultTotal = mysql_query( $sQuery, $gaSql['link'] ) or die(mysql_error()); $aResultTotal = mysql_fetch_array($rResultTotal); $iTotal = $aResultTotal[0]; //added to hide filtering //$iTotal = $iFilteredTotal; /* * Output */ $output = array( "sEcho" => intval($_GET['sEcho']), "iTotalRecords" => $iTotal, "iTotalDisplayRecords" => $iFilteredTotal, "aaData" => array() ); while ( $aRow = mysql_fetch_array( $rResult ) ) { $row = array(); for ( $i=0 ; $i<count($aColumns) ; $i++ ) { if ( $aColumns[$i] != ' ' ) { /* General output */ $row[] = $aRow[$i]; } } $output['aaData'][] = $row; } echo json_encode( $output ); 

If I used JOINS, updated request:

  SELECT SQL_CALC_FOUND_ROWS CONCAT( "<input type='hidden' id='order_id' value='", o.id, "'><a href='order_details.php?id=", o.id, "'><img src='search.png' border='0'></a> &nbsp;", o.id ), FROM_UNIXTIME(o.created_timestamp, '%m/%e/%Y'), CONCAT( "<input type='hidden' id='doctor_name' value='", d.first_name, ' ', d.last_name, "'><input type='hidden' id='doctor_id' value='", d.id, "'><a href='doctor_details.php?id=", d.id, "'><img src='search.png' border='0'></a> &nbsp;", d.first_name, ' ', d.last_name ), CONCAT( "<input type='hidden' id='patient_name' value='", p.first_name, ' ', p.last_name, "'><input type='hidden' id='patient_id' value='", p.id, "'><input type='hidden' id='patient_ssn' value='", p.ssn, "'><a href='patient_details.php?id=", p.id, "'><img src='search.png' border='0'></a> &nbsp;", p.first_name, ' ', p.last_name ), GROUP_CONCAT(t.name SEPARATOR ', '), o.status FROM `orders` o JOIN `doctors` d ON d.id = o.doctor_id JOIN `patients` p ON p.id = o.patient_id JOIN `tests_ordered` tst ON tst.order_id = o.id JOIN `tests` t ON t.id = tst.test_id WHERE o.status='Complete' OR o.status='Draft' GROUP BY o.id 

The problem is that DataTables is simply not designed to work properly with JOINS when sorting and filtering are used due to an array of columns, etc. I would like to see a solution that will work with such a request though.

+6
source share
3 answers

To get started, if you want to optimize your SQL expression, first get rid of this HTML crap there. If nothing else, it confuses the actual structure of the Statement. If you need, you can return it at the end of your optimization, although I would seriously vote against it: you have PHP for markup. For clarity, I'm used to using JOIN clauses, I rephrased all Thing accordingly.

This process gave me the following:

 SELECT SQL_CALC_FOUND_ROWS, o.id, o.created_timestamp, o.status, d.id, d.first_name, d.last_name, p.id, p.first_name, p.last_name, p.ssn GROUP_CONCAT(t.name SEPARATOR ', '), FROM `orders` o JOIN `doctors` d ON d.id = o.doctor_id JOIN `patients` p ON p.id = o.patient_id JOIN `users` u ON u.username = o.assigned_username JOIN `events` e ON e.event_id = o.event_id JOIN `tests_ordered` tst ON tst.order_id = o.id JOIN `tests` t ON t.id = tst.test_id WHERE o.status='Complete' OR o.status='Draft' GROUP BY o.id 

There are a few things here:

1) Your main table orders . This is also the one you use for the WHERE clause, and you are GROUPING BY. With id as the primary index and the second index on status this should not be too bad.

2) You link four more tables because I assume these are foreign keys. Two of these tables are never actually used: none (perhaps there is no need to join users and events ). You should get rid of those that, by the way, also remove the less star-shaped connection in the text column (username). Make sure that the other doctors and patients tables have their primary keys in the corresponding id columns.

3) You have a more complex join in the two tables tests_ordered and tests . All that does is give you a concatenated name string, but it adds the complexity of the GROUP BY clause. There are two ways to go from here: try to optimize these associations or remove them completely from the list.

3) Solution A To optimize these joins, make sure you have an index on tests_ordered.order_id and tests_ordered.test_id and a primary index on tests.id . Your expression should look like this:

 SELECT SQL_CALC_FOUND_ROWS, o.id, o.created_timestamp, o.status, d.id, d.first_name, d.last_name, p.id, p.first_name, p.last_name, p.ssn GROUP_CONCAT(t.name SEPARATOR ', '), FROM `orders` o JOIN `doctors` d ON d.id = o.doctor_id JOIN `patients` p ON p.id = o.patient_id JOIN `tests_ordered` tst ON tst.order_id = o.id JOIN `tests` t ON t.id = tst.test_id WHERE o.status='Complete' OR o.status='Draft' GROUP BY o.id 

3) Solution B Delete all tests / tests_ordered materials and put them in a separate file. Now your main choice will look like this:

 SELECT SQL_CALC_FOUND_ROWS, o.id, o.created_timestamp, o.status, d.id, d.first_name, d.last_name, p.id, p.first_name, p.last_name, p.ssn FROM `orders` o JOIN `doctors` d ON d.id = o.doctor_id JOIN `patients` p ON p.id = o.patient_id WHERE o.status='Complete' OR o.status='Draft' 

But you will need to run an additional SELECT for each line to get the associated name t.name, or you just have one SELECT for all order identifiers on the current page. The latter will look like this:

 SELECT o.id, GROUP_CONCAT(t.name SEPARATOR ', '), FROM `orders` o JOIN `tests_ordered` tst ON tst.order_id = o.id JOIN `tests` t ON t.id = tst.test_id WHERE o.in IN ( <put the 10 ids on your current page here, separated by commas> ) GROUP BY o.id 

Solution A should run pretty quickly on a decent machine. Solution B should run pretty quickly on any machine. Direct JOINS on indexed foreign keys is cheap.

4) None of the above options should return 642 million rows in a database with less than 642 million orders, and I believe that you do not. Since MySQL tells you that it probably had to build a complete Cartesian product, which also explains the high speed. This means that one of your direct foreign keys has joined you. Most likely, we are talking about a connection on users - a useless connection in the first place, but in any case: check the uniqueness of your username column.

+5
source

1) A few tips may be to remove all concats and unnecessary functions that you can use PHP for the presentation level, and select only the column values.

2) Select only the most important columns that are absolutely crutial in your schema, and then you can run a simple 1 row. Selects for other attributes. (although the number of requests may increase, it can actually speed things up, since multiple connections are usually very slow)

3) If nothing helps you, it may lead to some form of denormalization when you need to copy certain relationship values ​​across multiple tables to reduce the number of joins.

4) Cache, cache, cahce - at the database level, at the php level, mostly everywhere ...

5) In addition, it depends on your data, for example, on 700,000,000 rows, if you need to use only the most recent 100,000,000, and then develop some kind of archiving function to reduce the number of records in the operating tables and only if the record is in the archive, then search in the large archive table, which can be slow, but will happen much less ...

These are just a couple of very common skills.

0
source

In addition to @azzit's nice answer, I can give you some tips that we used to query the server sides on datatables:

  • avoiding the concat group , this will always make your queries very slow, even possibly using temporary tables, which you should avoid. So instead, expect your paginated result and run additional data (e.g. tests_ordered data here) based on the actual identifiers returned after pagination of the main request. Decision
  • the one used above can become problematic if you need to filter the main request on the tests_ordered elements. But in fact, the solution is to add EXISTS subsections to filter the main query for abstracts of 1-> n related tables (and with existing subqueries you don't need abstract tables in the main query, only when the subquery is when the filter)
  • Make one counting request with the necessary associations and conditions and try to record the results until the filters are changed (you can ignore the order by modifications).
  • do not keep order ; you must not allow all columns with order to be used . When ordering a query, the query will be very slow if a non-index column is used. Especially you should not allow ordering in multiple value columns, such as tests_ordered.
  • If you have a large number of joins and you think that your query is written in the correct order, use STRAIGHT_JOIN to select to avoid losing seconds in optimizer calculations (heppen with a really large number of joins).
  • Use left joins for columns that may be null, but try not to put them in the query if this data is not used in order, filter or not in the visible column.
  • test your request with explain , feel free to add filter options and browser options. You will see that using functions such as FROM_UNIXTIME may interfere with the use of the index.
  • the strategy used for columns with multiple values ​​can always be applied to all columns, perform a minimal query to get your paginated identifiers (only identifiers in columns, add joins requested for conditions or order, add a subquery for multiple value queries that have conditions), calculate the global result and apply pagination to it. Then, according to the obtained results 10/25/50/100, all cell data is loaded based on the row identifier.
0
source

All Articles