MariaDB does not use index to self-join 1 column due to low selectivity (all NULL)

We have a query that looks for duplicates in one of our tables, based on a rare identifier, let it be called rareIdentifier INT(10) UNSIGNED NULL . We have one column of the regular old index in this column.

The corresponding request is as follows:

 SELECT a.id, b.id FROM widget a INNER JOIN widget b ON a.rareIdentifier = b.rareIdentifier; 

The problem is that for the recent start of the duplicate search, we actually had rows 0 with a value for rareIdentifier ; those. all rows had NULL for this column. MariaDB decided not to use the index by choosing the Using join buffer (flat, BNL join) approach, which scanned the entire table.

But NULL cannot equal each other! So why is he trying to compare each pair of lines?

I understand that MySQL / MariaDB will not use an index if its selectivity is too low. I think so. In fact, it seems that only 1 value in the index means that the request is pretty much instantaneous.

The table is an InnoDB table.

+6
source share
2 answers

InnoDB may not be smart enough to realize that, compared to NULL always NULL , therefore false. Perhaps it just decided that β€œall values ​​are the same, so they should be equal” (but in fact I really don't know).

As a workaround, adding ... AND a.rareIdentifier IS NOT NULL should give the optimizer enough clue.

0
source

In most cases, this can be faster, especially if there are many rows with the same rareIdentifier .

 SELECT rareIdentifier, MIN(id), MAX(id), COUNT(*) FROM tbl WHERE rareIdentifier IS NOT NULL GROUP BY rareIdentifier HAVING COUNT(*) > 1; 

Or you can use GROUP_CONCAT(id) instead of min and max. (However, if there are many duplicates, the list will be truncated.)

Assuming InnoDB and INDEX(rareIdentifier) , this SELECT should be a very efficient "range" of index scans.

Back to your question ...

there were actually 0 rows ... MariaDB decided not to use an index

I used to see this often in older versions of MySQL. I wonder if Oracle is fixed, but MariaDB missed the fix.

0
source

All Articles