Why the PHP version is faster than MySQL

I have two very large tables to combine, and therefore I tried to optimize the update for speed. I noticed that performing the upgrade partially in PHP made it much faster, so I assumed that this means that I am not doing MySQL correctly.

I simplified the problem to try to narrow it down ...

  GRID_TABLE POSTCODE_TABLE
 idNo, lat, lng, nearestPostcode postcode, lat, lng
 ________________________________ _____________________
 1 57.1 -2.3 - AB12 3BA 56.3 -2.5
 2 56.8 -1.9 - AB12 1YA 56.2 -2.3
 .  .  .  .  .  .

 (200 entries) (35,000 entries)

I want to update GRID_TABLE with the nearest Postcode from POSTCODE_TABLE using latitude (lat) and longitude (lng) to find the closest zip code to each grid point ...

update grid_table set nearestPostcode = ( select postcode from postcode_table where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037 and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068 order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2) limit 1 ) 

The idea is that the where clause speeds up searches using indexes to narrow down the set to a few candidates, and then the order clause finds the closest one in that set.

This MySQL update takes 30 seconds, but if I instead update each row of GRID_TABLE individually in PHP, it will immediately flash.

 $queryStg = "select * from grid_table ;"; $sqlQuery1 = mysqli_query($mysqliLink, $queryStg); while( $sqlRow = mysqli_fetch_assoc( $sqlQuery1 ) ) { $idNo = $sqlRow['idNo']; $lat = $sqlRow['lat']; $lng = $sqlRow['lng']; $queryStg = " update grid_table set nearestPostcode = ( SELECT postcode FROM postcode_table where lat > " . ($lat - 0.0037) . " and lat < " . ($lat + 0.0037) . " and lng > " . ($lng - 0.0068) . " and lng < " . ($lng + 0.0068) . " ORDER BY POW(lat - $lat, 2) + POW((lng - $lng) * 0.546, 2) ASC limit 1 ) where idNo= $idNo; "; $sqlQuery2 = mysqli_query($mysqliLink, $queryStg); } 

Surely the MySQL version should be faster than the PHP version?

Here is MySQL for tables ...

  CREATE TABLE `grid_table` (
     `idNo` INT (11) NOT NULL AUTO_INCREMENT,
     `lat` FLOAT (6,4) NOT NULL COMMENT 'latitude',
     `lng` FLOAT (6,4) NOT NULL COMMENT 'longitude',
     `nearestPostcode` CHAR (8) NOT NULL,
     PRIMARY KEY (`idNo`),
     INDEX `lat_lng` (` lat`, `lng`)
 )
 ENGINE = MyISAM
 ROW_FORMAT = DEFAULT
 AUTO_INCREMENT = 30047 
  CREATE TABLE `postcode_table` (
     `postcode` CHAR (8) NOT NULL,
     `lat` FLOAT (6,4) NOT NULL COMMENT 'latitude',
     `lng` FLOAT (6,4) NOT NULL COMMENT 'longitude',
     PRIMARY KEY (`postcode`),
     INDEX `lat` (` lat`),
     INDEX `lng` (` lng`),
     INDEX `lat_lng` (` lat`, `lng`)
 )
 ENGINE = MyISAM
 ROW_FORMAT = DEFAULT

MySQL import file here ... https://docs.google.com/leaf?id=0B93lksnTC7_cM2Y2ZDk1Y2YtMGQ3Yy00OTIxLTk0ZDAtZmE2NmQ3YTc1ZWRm&hl=en

(if you run UPDATE, the next 10 codes will be added).

UPDATE AFTER ANSWERS ...

I ran this ...

 explain extended SELECT postcode FROM postcode_table where lat > 57.0 and lat < 57.0074 and lng > -2.013 and lng < -2 ORDER BY POW(lat - 57.0, 2) + POW((lng - -2) * 0.546, 2) ASC 

He is back...

  id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
 1, SIMPLE, postcode_table, range, lat, lng, lat_lng, lat_lng, 8, NULL, 65,100.00, Using where;  Using filesort

Removing "order" using "caluse" → no speed difference.

Simplification of the 'where' clause by removing the 'lng', i.e.

  where lat between grid_table.lat - 0.0037 and grid_table.lat + 0.0037 
-> faster: 3 s, not 30 seconds.

Using spatial column and index (see below) -> much slower (190 seconds). Not sure if I did it right.

  ALTER TABLE `grid_table` ADD COLUMN` coords` POINT NOT NULL;
 update grid_table set coords = POINT (lat, lng);
 ALTER TABLE `grid_table` ADD SPATIAL INDEX` coords` (`coords`);

 ALTER TABLE `postcode_table` ADD COLUMN` coords` POINT NOT NULL;
 update postcode_table set coords = POINT (lat, lng);
 ALTER TABLE `postcode_table` ADD SPATIAL INDEX` coords` (`coords`);

 analyze table grid_table;
 optimize table grid_table;
 analyze table postcode_table;
 optimize table postcode_table;
  update grid_table set nearestPostcode = (
     select postcode from postcode_table 
     WHERE MBRContains (GeomFromText (concat (
          'POLYGON ((', 
           grid_table.lat - 0.0037, '', grid_table.lng - 0.0068, ',',
           grid_table.lat - 0.0037, '', grid_table.lng + 0.0068, ',',
           grid_table.lat + 0.0037, '', grid_table.lng - 0.0068, ',',
           grid_table.lat - 0.0037, '', grid_table.lng - 0.0068, 
           '))')), postcode_table.coords)
      order by POW (lat - grid_table.lat, 2) + POW ((lng - grid_table.lng) * 0.546.2)
      limit 1 
      )
+7
source share
3 answers

In your version of MySQL, your subquery works with all the 30000 grid_table entries, whether in your version of PHP, this is only one. When you add where to an external PK table.

I suggest you change the update request here. For example, try to do this without a subquery, several updates, as here http://dev.mysql.com/doc/refman/5.0/en/update.html .

I think this should help.

Something like:

 update grid_table, postcode_table set grid_table.nearestPostcode = postcode_table.postcode where postcode_table.lat > grid_table.lat - 0.0037 and postcode_table.lat < grid_table.lat + 0.0037 and postcode_table.lng > grid_table.lng - 0.0068 and lng < grid_table.lng + 0.0068 group by grid_table.idNo having (POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)) = min(POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2)) 

Maybe this version may help, but I'm not sure. I suppose the main root problem in your first version is the subquery for all records.

To explain the update , you can "convert" it to a similar select:

 explain select *, ( select postcode from postcode_table where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037 and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068 order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2) limit 1 ) nearestPostcode from grid_table 

And you will see:

 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY grid_table ALL 224 2 DEPENDENT SUBQUERY postcode_table ALL lat,lng,lat_lng 35605 Using where; Using temporary; Using filesort 

But in case of idNo we have:

 explain select *, ( select postcode from postcode_table where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037 and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068 order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2) limit 1 ) nearestPostcode from grid_table where idNo = 1487; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY grid_table const PRIMARY PRIMARY 4 const 1 2 DEPENDENT SUBQUERY postcode_table range lat,lng,lat_lng lat 4 18 Using where; Using filesort 

So, we have 35605 lines against ~ 18 * 224 (~ 4000).

To find the right query, try finding the right one.

Update

The subquery is not the root here :( Therefore, I think we should try some pre-calculated + indexed column. The goal is to avoid order by SOMEFUNC()

+4
source

Look at the execution plan to find out what has been going on for so long. http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html

0
source

My guess is that the difference is that you provide the value of $lat in the query string, thereby saving large scans for the search here: -

 order by POW(lat - grid_table.lat,2) 

As with Mr47, you can see the EXPLAIN SQL statement.

0
source

All Articles