I'm fairly new to MySQL, and I'm trying to select a separate set of rows using this statement:
SELECT DISTINCT sp.atcoCode, sp.name, sp.longitude, sp.latitude FROM `transportdata`.stoppoints as sp INNER JOIN `vehicledata`.gtfsstop_times as st ON sp.atcoCode = st.fk_atco_code INNER JOIN `vehicledata`.gtfstrips as trip ON st.trip_id = trip.trip_id INNER JOIN `vehicledata`.gtfsroutes as route ON trip.route_id = route.route_id INNER JOIN `vehicledata`.gtfsagencys as agency ON route.agency_id = agency.agency_id WHERE agency.agency_id IN (1,2,3,4);
However, the select statement takes about 10 minutes, so something is clearly happening.
One of the significant factors is that the gtfsstop_times table is huge. (~ 250 million records)
The indicators seem to be configured correctly; all listed connections use indexed columns. Table sizes are approximately:
gtfsagencys - 4 rows gtfsroutes - 56,000 rows gtfstrips - 5,500,000 rows gtfsstop_times - 250,000,000 rows `transportdata`.stoppoints - 400,000 rows
The server has 22 GB of memory, I installed the InnoDB buffer pool on 8G, and I use MySQL 5.6.
Can anyone see a way to make this run faster? Or indeed, in general!
Does it matter that the stop point table is in a different scheme?
EDIT: EXPLAIN SELECT ... returns this:

Carlos P
source share