I have two very large tables to combine, and therefore I tried to optimize the update for speed. I noticed that performing the upgrade partially in PHP made it much faster, so I assumed that this means that I am not doing MySQL correctly.
I simplified the problem to try to narrow it down ...
GRID_TABLE POSTCODE_TABLE
idNo, lat, lng, nearestPostcode postcode, lat, lng
________________________________ _____________________
1 57.1 -2.3 - AB12 3BA 56.3 -2.5
2 56.8 -1.9 - AB12 1YA 56.2 -2.3
. . . . . .
(200 entries) (35,000 entries)
I want to update GRID_TABLE with the nearest Postcode from POSTCODE_TABLE using latitude (lat) and longitude (lng) to find the closest zip code to each grid point ...
update grid_table set nearestPostcode = ( select postcode from postcode_table where lat > grid_table.lat -0.0037 and lat < grid_table.lat +0.0037 and lng > grid_table.lng -0.0068 and lng < grid_table.lng +0.0068 order by POW(lat - grid_table.lat,2) + POW((lng - grid_table.lng) *0.546,2) limit 1 )
The idea is that the where clause speeds up searches using indexes to narrow down the set to a few candidates, and then the order clause finds the closest one in that set.
This MySQL update takes 30 seconds, but if I instead update each row of GRID_TABLE individually in PHP, it will immediately flash.
$queryStg = "select * from grid_table ;"; $sqlQuery1 = mysqli_query($mysqliLink, $queryStg); while( $sqlRow = mysqli_fetch_assoc( $sqlQuery1 ) ) { $idNo = $sqlRow['idNo']; $lat = $sqlRow['lat']; $lng = $sqlRow['lng']; $queryStg = " update grid_table set nearestPostcode = ( SELECT postcode FROM postcode_table where lat > " . ($lat - 0.0037) . " and lat < " . ($lat + 0.0037) . " and lng > " . ($lng - 0.0068) . " and lng < " . ($lng + 0.0068) . " ORDER BY POW(lat - $lat, 2) + POW((lng - $lng) * 0.546, 2) ASC limit 1 ) where idNo= $idNo; "; $sqlQuery2 = mysqli_query($mysqliLink, $queryStg); }
Surely the MySQL version should be faster than the PHP version?
Here is MySQL for tables ...
CREATE TABLE `grid_table` (
`idNo` INT (11) NOT NULL AUTO_INCREMENT,
`lat` FLOAT (6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT (6,4) NOT NULL COMMENT 'longitude',
`nearestPostcode` CHAR (8) NOT NULL,
PRIMARY KEY (`idNo`),
INDEX `lat_lng` (` lat`, `lng`)
)
ENGINE = MyISAM
ROW_FORMAT = DEFAULT
AUTO_INCREMENT = 30047
CREATE TABLE `postcode_table` (
`postcode` CHAR (8) NOT NULL,
`lat` FLOAT (6,4) NOT NULL COMMENT 'latitude',
`lng` FLOAT (6,4) NOT NULL COMMENT 'longitude',
PRIMARY KEY (`postcode`),
INDEX `lat` (` lat`),
INDEX `lng` (` lng`),
INDEX `lat_lng` (` lat`, `lng`)
)
ENGINE = MyISAM
ROW_FORMAT = DEFAULT
MySQL import file here ... https://docs.google.com/leaf?id=0B93lksnTC7_cM2Y2ZDk1Y2YtMGQ3Yy00OTIxLTk0ZDAtZmE2NmQ3YTc1ZWRm&hl=en
(if you run UPDATE, the next 10 codes will be added).
UPDATE AFTER ANSWERS ...
I ran this ...
explain extended SELECT postcode FROM postcode_table where lat > 57.0 and lat < 57.0074 and lng > -2.013 and lng < -2 ORDER BY POW(lat - 57.0, 2) + POW((lng - -2) * 0.546, 2) ASC
He is back...
id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, postcode_table, range, lat, lng, lat_lng, lat_lng, 8, NULL, 65,100.00, Using where; Using filesort
Removing "order" using "caluse" → no speed difference.
Simplification of the 'where' clause by removing the 'lng', i.e.
where lat between grid_table.lat - 0.0037 and grid_table.lat + 0.0037
-> faster: 3 s, not 30 seconds.
Using spatial column and index (see below) -> much slower (190 seconds). Not sure if I did it right.
ALTER TABLE `grid_table` ADD COLUMN` coords` POINT NOT NULL;
update grid_table set coords = POINT (lat, lng);
ALTER TABLE `grid_table` ADD SPATIAL INDEX` coords` (`coords`);
ALTER TABLE `postcode_table` ADD COLUMN` coords` POINT NOT NULL;
update postcode_table set coords = POINT (lat, lng);
ALTER TABLE `postcode_table` ADD SPATIAL INDEX` coords` (`coords`);
analyze table grid_table;
optimize table grid_table;
analyze table postcode_table;
optimize table postcode_table;
update grid_table set nearestPostcode = (
select postcode from postcode_table
WHERE MBRContains (GeomFromText (concat (
'POLYGON ((',
grid_table.lat - 0.0037, '', grid_table.lng - 0.0068, ',',
grid_table.lat - 0.0037, '', grid_table.lng + 0.0068, ',',
grid_table.lat + 0.0037, '', grid_table.lng - 0.0068, ',',
grid_table.lat - 0.0037, '', grid_table.lng - 0.0068,
'))')), postcode_table.coords)
order by POW (lat - grid_table.lat, 2) + POW ((lng - grid_table.lng) * 0.546.2)
limit 1
)