To optimize these queries, you need to slightly de-normalize the data.
For example, you might have a track table containing the id , name and release date tracks, and a map_location_to_track table that describes which tracks can be downloaded from. To answer the "10 most recent tracks for location A," you need to get ALL tracks for location A from map_location_to_track , and then attach them to the track table to arrange them by release date and select the top 10.
If instead all of the data is in the same table, the sequencing step can be avoided. For instance...
CREATE TABLE map_location_to_track ( location_id INT, track_id INT, release_date DATETIME, PRIMARY KEY (location_id, release_date, track_id) ) SELECT * FROM map_location_to_track WHERE location_id = A ORDER BY release_date DESC LIMIT 10
Having location_id as the first entry in the primary key ensures that the WHERE clause is just an index lookup. Then there is no need to reorder the data that he already ordered for us with the primary key, but instead just select 10 records at the end.
You can really join the track table to get a name, price, etc., but now you only need to do this for 10 entries, not just everything in this place.
To solve the same query for “locations A OR B”, there are several options that can be performed differently depending on the RDBMS you are using.
The first is simple, although some RDBMSs do not play well with IN ...
SELECT track_id, release_date FROM map_location_to_track WHERE location_id IN (A, B) GROUP BY track_id, release_date ORDER BY release_date DESC LIMIT 10
The next option is almost identical, but still some RDBMSs do not play well with the OR logic applied to the INDEX.
SELECT track_id, release_date FROM map_location_to_track WHERE location_id = A or location_id = B GROUP BY track_id, release_date ORDER BY release_date DESC LIMIT 10
In any case, the algorithm used to streamline the list of entries to 10 is hidden from you. It is a question to try and see; an index is still available so this MAY be performed.
An alternative is to explicitly define part of the approach in your SQL statement ...
SELECT * FROM ( SELECT track_id, release_date FROM map_location_to_track WHERE location_id = A ORDER BY release_date DESC LIMIT 10 UNION SELECT track_id, release_date FROM map_location_to_track WHERE location_id = B ORDER BY release_date DESC LIMIT 10 ) AS data ORDER BY release_date DESC LIMIT 10
The optimizer can still understand that these two combined datasets are ordered and therefore make the outer order very fast. However, even if not, an order for 20 items is pretty quick. More importantly, these are fixed overheads: it doesn’t matter if you have a billion tracks in each place, we simply combine the two lists of 10.
The most difficult thing to optimize is the AND condition, but even then the existence of the “TOP 10” restriction can help work wonders.
Adding a HAVING clause to IN or OR approaches can solve this problem, but, again, depending on your RDBMS, it may work less optimally.
SELECT track_id, release_date FROM map_location_to_track WHERE location_id = A or location_id = B GROUP BY track_id, release_date HAVING COUNT(*) = 2 ORDER BY release_date DESC LIMIT 10
An alternative is to try the two-query approach ...
SELECT location_a.* FROM ( SELECT track_id, release_date FROM map_location_to_track WHERE location_id = A ) AS location_a INNER JOIN ( SELECT track_id, release_date FROM map_location_to_track WHERE location_id = B ) AS location_b ON location_a.release_date = location_b.release_date AND location_a.track_id = location_b.track_id ORDER BY location_a.release_date DESC LIMIT 10
This time we cannot limit the two subqueries to only 10 records; for all we know, the last 10 at location a do not appear at location b at all. However, the primary key saves us again. Two data sets are omitted by release date, RDBMScan can start from the top record of each set and combine the two until there are 10 records, and then stop.
NOTE. Since release_date is in the primary key, and before track_id , make sure that it is used in the connection.
Depending on the DBMS, you do not even need subqueries. Perhaps you can just join the table without changing the RDBMS plan ...
SELECT location_a.* FROM map_location_to_track AS location_a INNER JOIN map_location_to_track AS location_b ON location_a.release_date = location_b.release_date AND location_a.track_id = location_b.track_id WHERE location_a.location_id = A AND location_b.location_id = B ORDER BY location_a.release_date DESC LIMIT 10
All in all, a combination of three things makes this pretty effective:
- Partially De-Normalization of data to ensure it is friendly for our needs
- Knowing that we only need the first 10 results
- Knowing that we ever deal with two locations maximum
There are variations that can be optimized for any number of records and any number of locations, but they are much less effective than the problem posed in this matter.