How to avoid a full table scan using this basic internal join?

Question

How to avoid a full table scan using this basic internal join?

I have a table that has a foreign key for a table that stores some blob data. When I make an inner join in tables with a condition in the main table, the join type goes from "index" to "ALL". I would like to avoid this since my blob table is on the order of tens of gigabytes. How can i avoid this?

Here is the basic inner join:

EXPLAIN SELECT m.id, b.id, b.data FROM metadata m, blobstore b WHERE m.fkBlob = b.id; 1, 'SIMPLE', 'm', 'index', 'fk_blob', 'fk_blob', '4', '', 1, 'Using index' 1, 'SIMPLE', 'b', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'blob_index.m.fkBlob', 1, ''

Here I add the condition to the main table:

 EXPLAIN SELECT m.id, b.id, b.data FROM metadata m, blobstore b WHERE m.fkBlob = b.id AND m.start < '2009-01-01'; 1, 'SIMPLE', 'b', 'ALL', 'PRIMARY', '', '', '', 1, '' 1, 'SIMPLE', 'm', 'ref', 'fk_blob,index_start', 'fk_blob', '4', 'blob_index.b.id', 1, 'Using where'

Note that the order in which the tables are listed has changed. Now it performs a full table scan on the blob table due to the condition that I added regarding the main table.

Here is the diagram:

  DROP TABLE IF EXISTS `blob_index`.`metadata`; CREATE TABLE `blob_index`.`metadata` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `fkBlob` int(10) unsigned NOT NULL, `start` datetime NOT NULL, PRIMARY KEY (`id`), KEY `fk_blob` (`fkBlob`), KEY `index_start` (`start`), CONSTRAINT `fk_blob` FOREIGN KEY (`fkBlob`) REFERENCES `blobstore` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; DROP TABLE IF EXISTS `blob_index`.`blobstore`; CREATE TABLE `blob_index`.`blobstore` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `data` mediumblob NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

+6

inner-join mysql blob query-optimization

Fredrick Dec 23 '09 at 12:32

source share

5 answers

The optimizer believes that the query will be useful when replacing the order of the tables (which, most likely, means that the statistics are not updated).

You can try adding an index to metadata (start, fkBlob) :

 CREATE INDEX ix_metadata_start_blob ON metadata (start, fkBlob)

and run ANALYZE TABLE in both tables.

Thus, the index on start will be used to filter on metadata , which will be made leading.

You can also force the join order:

 SELECT * FROM metadata m STRAIGHT_JOIN blobstore b ON b.id = m.fkBlob WHERE m.start <= '2009-01-01'

although this is usually not recommended.

+3

Quassnoi Dec 23 '09 at 17:28

source share

If I read what you posted correctly, it is from index to ref and eq_ref to all .

 CREATE INDEX idx_metadata USING BTREE ON `metadata` (fkBlob,start);

Bring him back.

0

Don Dec 23 '09 at 12:52

source share

 if the index doesnot take it right use HINTS select /* INDEX <index_name> */ blah blah blah from ........

0

Venkataramesh kommoju Dec 29 '09 at 11:50

source share

In the first example, MySQL used the metadata index fk_blob because it was a coverage index — each column that you used in the query was present in the index. (This is what “index usage” means). This query still did a full check, but it scanned every row with a secondary index instead of the primary. Once you used start, you lost the coverage index, and MySQL calculated that it was faster to use blobstore as the motion index. (The primary InnoDB index is integrated with row storage.)

If you want MySQL to continue to use the metadata index as the driving index, make sure that it has one index that will be useful for the query. An index on (start, fkBlob) would be best for a second query, but it might not be useful for other queries. The next best index is to replace (fkBlob) with (fkBlob, start). You will have to balance with too many indexes (which are expensive to maintain) and have efficient query plans. Testing, testing, testing - and never blindly believe, explain in your database of developers.

0

Ken fox Dec 29 '09 at 16:39

source share

Michal Čihař · Accepted Answer · 2009-12-23T12:48:21+0000

I think you are trying this on an empty table (because MySQL considers it necessary to go through one row for a full table scan), which may affect the scheduler's results. When you do this on a real table, the EXPLAIN results may differ (and actually change in my test).

How to avoid a full table scan using this basic internal join?

More articles: