If you are using MySQL 5.6 or later, you can ask the query optimizer what it does,
SET optimizer_trace="enabled=on";
You will almost certainly need to refer to the following sections in the MySQL Tracking Optimizer and Optimizer link
Looking at the first explanation, it seems that the query is faster, probably because the optimizer can use table B to filter down to the required rows based on the join, and then use the foreign key to get the rows in table A
In the explanation, it is this bit that is interesting; there is only one string match, and schema.A.b_id used. Effectively this is pre-filtering strings from A , where I think a performance difference is occurring.
| ref | rows | Extra | | schema.A.b_id | 1 | Using where |
So, as usual with queries, it all comes down to indexes - or rather, missing indexes. Just because you have indexes on individual fields, this does not necessarily mean that they are suitable for the query that you are using.
Basic rule: if EXPLAIN does not say Using an index , you need to add a suitable index.
Looking at the conclusion of the explanation, the first interesting has an ironic consequence on each line; namely Extra
In the first example, we see that
| 1 | SIMPLE | A | .... Using where | | 1 | SIMPLE | B | ... Using where |
Both of these Uses where not suitable; ideally at least one, and preferably both should say Index usage
When you do
SELECT COUNT(A.id) FROM A WHERE (b_id != 23) AND <condition>;
and see Using where , then you need to add an index that scans the table.
for example if you did
EXPLAIN SELECT COUNT(A.id) FROM A WHERE (Id > 23)
You should see Use where; Using an index (assuming Id is a primary key and has an index)
If you added a condition to the end
EXPLAIN SELECT COUNT(A.id) FROM A WHERE (Id > 23) and Field > 0
and see Using where , then you need to add an index for two fields. Just having an index in a field does not mean that MySQL will be able to use this index during a query in several fields - this is what the query designer will decide. I am not entirely sure of the internal rules; but as a rule, adding an additional index to match the query is very helpful.
Therefore, adding an index (in two fields in the query above):
ALTER TABLE `A` ADD INDEX `IndexIdField` (`Id`,`Field`)
must change it so that when querying based on these two fields, an index appears.
I tried this in one of my databases with the Transactions and User tables.
I will use this request
EXPLAIN SELECT COUNT(*) FROM transactions WHERE (id < 9000) and user != 11;
Run without an index in two fields:
PRIMARY,user PRIMARY 4 NULL 14334 Using where
Then add the index:
ALTER TABLE `transactions` ADD INDEX `IndexIdUser` (`id`, `user`);
Then the same request again and this time
PRIMARY,user,Index 4 Index 4 4 NULL 12628 Using where; Using index
This time it uses indexes - and the result will be much faster.
From comments from @Wrikken - and also remember that I don't have the exact schema / data, so some of these studies require assumptions about the schema (which might be wrong)
SELECT COUNT(A.id) FROM A FORCE INDEX (b_id) would perform at least as good as SELECT COUNT(A.id) FROM A INNER JOIN B ON A.b_id = B.id.
If we look at the first EXPLAIN in the OP, we will see that there are two elements in the query. Turning to the EXPLAIN documentation for * eq_ref *, I see that this will determine the lines to consider based on this relationship.
The order in which an explanation is output does not necessarily mean that it does one thing and then another; it's just what was chosen to fulfill the request (at least as far as I can tell).
For some reason, the query optimizer decided not to use the index on b_id - I assume that because of the query, the optimizer decided that it would be more efficient to scan the table.
The second explanation bothers me a bit because it does not account for the index on b_id ; possibly due to AND <condition> (which is omitted, so I guess it could be). When I try to do this with an index on b_id , it uses the index; but as soon as the condition is added, it does not use the index.
So, when doing
SELECT COUNT(A.id) FROM A INNER JOIN B ON A.b_id = B.id.
This all indicates that the PRIMARY index on B is the place where the speed difference occurs; I assume due to schema.A.b_id in the explanation that there is a foreign key in this table; which should be a better assembly of related rows than the index on b_id - so the query optimizer can use this relation to determine which rows to choose - and because the primary index is better than the secondary indexes, it will be much faster to select rows from B, and then use the link link to match the strings in A.