(Bitwise) Supersets and Subsets in MySQL

The following queries are effective in MySQL:

SELECT * FROM table WHERE field & number = number; # to find values with superset of number bits SELECT * FROM table WHERE field | number = number; # to find values with subset of number bits 

... if an index was created for the field?

If not, is there a way to do this faster?

+4
source share
3 answers

Update:

See this blog post for performance details:


 SELECT * FROM table WHERE field & number = number SELECT * FROM table WHERE field | number = number 

This metric can be effective in two ways:

  • To avoid early table scans (since the value for comparison is contained in the index itself)
    • To limit the range of considered values.

None of the conditions in the queries above can be changed, this index will not be used to scan the range (with conditions such as now).

However, point 1 saved, and the index may be useful.

If your table contains, say, 100 bytes per row on average and 1,000,000 records, then scanning the table will need to scan 100 Mb data.

If you have an index (with a 4 byte key, a 6 byte line pointer and some internal overhead), the request will only need to scan 10 Mb data, plus additional data from the if table, the filter will be successful.

  • A table scan is more efficient if your condition is not selective (you have a high probability of meeting the condition).
  • Index scanning is more effective if your condition is selective (you have a low probability to match the condition).

Both of these queries will require a scan of the entire index.

But by rewriting the AND query, you can also take advantage of index ranking.

This condition:

field & number = number

can only match fields if the most significant bits of number are also set to field .

And you should just provide this additional condition for the request:

 SELECT * FROM table WHERE field & number = number AND field >= 0xFFFFFFFF & ~((2 << FLOOR(LOG(2, 0xFFFFFFFF & ~number))) - 1) 

This will use the range for coarse filtration and the conditions for fine filtration.

The more bits for number not set at the end, the better.

+6
source

I doubt that the optimizer will see that one ...

Perhaps you can call EXPLAIN for these queries and confirm my pessimistic assumption. (remembering, of course, that most decisions of a query plan are based on a specific instance of a given database, that is, variable amounts of data and / or just data with a different statistical profile can create different plans).

Assuming that the table has a significant number of rows and that the β€œbattered” criteria remain sufficiently selective), possible optimization is achieved by avoiding the bitwise operation in each individual row by overwriting the query using the IN construct (or with JOIN)

Something like this (conceptual, i.e. not verified)

 CREATE TEMPORARY TABLE tblFieldValues (Field INT); INSERT INTO tblFieldValues SELECT DISTINCT Field FROM table; -- SELECT * FROM table WHERE field | number = number; -- now becomes SELECT * FROM table t WHERE field IN (SELECT Field FROM tblFieldValues WHERE field | number = number); 

The full benefits of this approach should be evaluated with different use cases (all of which have a significant number of rows in the table, since otherwise the efficient WHERE field | number = number approach is quite effective), but I suspect it could be significantly faster . Further success can be achieved if tblFieldValues ​​does not need to be recreated each time. Efficiently creating this table, of course, involves an index in a field in the original table.

+1
source

I tried this myself, and bitwise operations are not enough so that Mysql does not use the index in the "field" column. Most likely, a full index scan is occurring.

0
source

All Articles