I made some improvements to the JOIN version; see below.
I voted for the JOIN approach for speed. Here is how I defined it:
HAVING version 1
mysql> FLUSH STATUS; mysql> SELECT city -> FROM us_vch200 -> WHERE state IN ('IL', 'MO', 'PA') -> GROUP BY city -> HAVING count(DISTINCT state) >= 3; +-------------+ | city | +-------------+ | Springfield | | Washington | +-------------+ mysql> SHOW SESSION STATUS LIKE 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_external_lock | 2 | | Handler_read_first | 1 | | Handler_read_key | 2 | | Handler_read_last | 1 | | Handler_read_next | 4175 | -- full index scan (etc) +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+ | 1 | SIMPLE | us_vch200 | range | state_city,city_state | city_state | 769 | NULL | 4176 | Using where; Using index for group-by (scanning) | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
"Extra" indicates that he decided to do GROUP BY and use INDEX(city, state) , although INDEX(state, city) may make sense.
HAVING, version 2
Switching to INDEX(state, city) gives:
mysql> FLUSH STATUS; mysql> SELECT city -> FROM us_vch200 IGNORE INDEX(city_state) -> WHERE state IN ('IL', 'MO', 'PA') -> GROUP BY city -> HAVING count(DISTINCT state) >= 3; +-------------+ | city | +-------------+ | Springfield | | Washington | +-------------+ mysql> SHOW SESSION STATUS LIKE 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_external_lock | 2 | | Handler_read_key | 401 | | Handler_read_next | 398 | | Handler_read_rnd | 398 | (etc) +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+ | 1 | SIMPLE | us_vch200 | range | state_city,city_state | state_city | 2 | NULL | 397 | Using where; Using index; Using filesort | +----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
Join
mysql> SELECT x.city -> FROM us_vch200 x -> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO' -> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA' -> WHERE x.state = 'IL'; +-------------+ | city | +-------------+ | Springfield | | Washington | +-------------+ 2 rows in set (0.00 sec) mysql> SHOW SESSION STATUS LIKE 'Handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_external_lock | 6 | | Handler_read_key | 86 | | Handler_read_next | 87 | (etc) +----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+ | 1 | SIMPLE | y | ref | state_city,city_state | state_city | 2 | const | 81 | Using where; Using index | | 1 | SIMPLE | z | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index | | 1 | SIMPLE | x | ref | state_city,city_state | state_city | 769 | const,world.y.city | 1 | Using where; Using index | +----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
Only INDEX(state, city) required. The number of handlers is the smallest for this formulation, so I think it is the fastest.
Notice how the optimizer himself decided which table to start with, probably due to
+-------+----------+ | state | COUNT(*) | +-------+----------+ | IL | 221 | | MO | 81 | -- smallest | PA | 96 | +-------+----------+
conclusions
JOIN (without unnecessary table t ) is probably the fastest. Plus this composite index is needed: INDEX(state, city) .
To translate back to your use case:
city --> documentid state --> termid
Caution: YMMV, because the distribution of values ββfor documentid and termid can be very different from the test case used.