Should I COUNT (*) or not?

I know that in general itโ€™s a bad idea to make such requests:

SELECT * FROM `group_relations` 

But when I just want a count, I have to go to this query, as this allows the table to be modified, but it gives the same results.

 SELECT COUNT(*) FROM `group_relations` 

Or more specific

 SELECT COUNT(`group_id`) FROM `group_relations` 

I have the feeling that the latter could potentially be faster, but are there other things to consider?

Update . I use InnoDB in this case, sorry for not being more specific.

+72
performance mysql innodb
Jan 19 '09 at 11:07
source share
14 answers

If the column in question is NOT NULL, both of your queries are equivalent. When group_id contains null values,

 select count(*) 

will count all rows, whereas

 select count(group_id) 

will only count rows where group_id is not null.

In addition, some database systems, such as MySQL, use optimization when requesting a counter (*), which makes such queries a little faster than specific ones.

Personally, when I just count, I do count (*) to be safe with zeros.

+100
Jan 19 '09 at 11:12
source share

If I remember correctly, in MYSQL COUNT (*) all rows are counted, while COUNT (column_name) only counts rows with a non-NULL value in this column.

+21
Jan 19 '09 at 11:10
source share

COUNT (*) counts all rows, while COUNT (column_name) will only count rows with no NULL values โ€‹โ€‹in the specified column.

It is important to note in MySQL:

COUNT () works very quickly in MyISAM tables for columns * or non-zero, because the number of rows is cached. InnoDB does not have row caching, so there is no performance difference for COUNT (*) or COUNT (column_name), regardless of whether this column can be null or not. You can learn more about the differences in this MySQL performance blog post.

+11
Jan 19 '09 at 11:32
source share

if you try SELECT COUNT(1) FROM group_relations, it will be a little faster because it will not try to extract information from your columns.

Edit: I just did some research and found out that this only happens on some cue ball. In sqlserver, it uses 1 or * equally, but on oracle it is faster to use 1.

http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/9367c580-087a-4fc1-bf88-91a51a4ee018/

Apparently there is no difference between the two in mysql, e.g. sqlserver, apparently the parser is changing the query to select (1). Sorry if I am somehow deceiving you.

+8
Jan 19 '09 at 11:14
source share

I was curious. Everything is good to read documentation and theoretical answers, but I like to balance those who have empirical data.

I have a MySQL table (InnoDB) that contains 5,607,997 records. The table is in my own sandbox, so I know that the content is static and no one else uses the server. I think this effectively eliminates all external performance impacts. I have a table with the auto_increment (Id) primary key field, which, as I know, will never be null, which I will use for the where where test (ID WHERE NO NO).

The only possible glitch that I see when performing tests is the cache. The first time you run a query, it will always be slower than subsequent queries that use the same indexes. I will refer to this below as a request for caching Seeding. Just to mix it up a bit, I ran it with a where clause, which I know will always be evaluated as true regardless of any data (TRUE = TRUE).

Here are my results:

Querytype

  | w/o WHERE | where id is not null | where true=true 

COUNT ()

  | 9 min 30.13 sec ++ | 6 min 16.68 sec ++ | 2 min 21.80 sec ++ | 6 min 13.34 sec | 1 min 36.02 sec | 2 min 0.11 sec | 6 min 10.06 se | 1 min 33.47 sec | 1 min 50.54 sec 

COUNT (Id)

  | 5 min 59.87 sec | 1 min 34.47 sec | 2 min 3.96 sec | 5 min 44.95 sec | 1 min 13.09 sec | 2 min 6.48 sec 

COUNT (1)

  | 6 min 49.64 sec | 2 min 0.80 sec | 2 min 11.64 sec | 6 min 31.64 sec | 1 min 41.19 sec | 1 min 43.51 sec 

++ This is considered caching. It is expected that it will be slower than the rest.

I would say that the results speak for themselves. COUNT (Id) usually cuts others. Adding a Where clause significantly reduces access time, even if this condition, which you know, evaluates to true. Sweet spot seems COUNT (Id) ... WHERE Id NOT NULL.

I would like to see the results of other peoples, perhaps with smaller tables or with sentences in different fields, except for the field that you consider. I am sure there are other options that I have not taken into account.

+5
Mar 19 '09 at 21:36
source share

Look for alternatives

As you saw when tables grow large, COUNT queries become slow. I think the most important thing is to consider the nature of the problem you are trying to solve. For example, many developers use COUNT queries to create pagination for large record sets to determine the total number of pages in the result set.

Knowing that COUNT queries will slow down, you can consider an alternative way to display pagination controls, which simply allows you to execute a slow query. Google pagination is a great example.

Denormalize

If you absolutely need to know the number of records corresponding to a particular account, consider the classic method of data denormalization. Instead of counting the number of rows during the search, consider increasing the counter when inserting a record and decreasing this counter when deleting a record.

If you decide to do this, consider using idempotent transactional operations to synchronize these denormalized values.

 BEGIN TRANSACTION; INSERT INTO `group_relations` (`group_id`) VALUES (1); UPDATE `group_relations_count` SET `count` = `count` + 1; COMMIT; 

Alternatively, you can use database triggers if your RDBMS supports them.

Depending on your architecture, it might make sense to use a caching layer, such as memcached, to store, increase and decrease the denormalized value, and simply seep into a slow COUNT request when the cache key is missing. This can lead to a reduction in overall competition for recording if you have very volatile data, although in such cases you might want to consider solutions to the dog heap effect .

+4
Jul 08 '09 at 6:40
source share

MySQL ISAM tables must have optimization for COUNT (*), skipping the full table scan.

+2
Jan 19 '09 at 11:39
source share

An asterisk in COUNT has no support with an asterisk to select all fields in the table. This is pure garbage to say that COUNT (*) is slower than COUNT (field)

I want to select COUNT (*) faster than select COUNT (field). If the RDBMS detects that you indicate "*" on COUNT instead of a field, there is no need to evaluate anything to increase the counter. If you specify a field in COUNT, the RDBMS will always evaluate whether your field is null or not.

But if your field is NULL, specify the field in COUNT.

+2
Jan 19 '09 at 12:00
source share

COUNT (*) facts and myths:

MYTH : "InnoDB does not process count (*) requests":

Most count (*) queries are executed in the same way by all storage engines, if you have a WHERE clause, otherwise InnoDB will have to run a full table scan.

FACT : InnoDB does not optimize count (*) queries without where clause

+2
Mar 09 '09 at 20:56
source share

It is best to count an indexed column such as a primary key.

 SELECT COUNT(`group_id`) FROM `group_relations` 
+2
Jun 22 '09 at 22:49
source share

This should depend on what you are actually trying to achieve, as Sebastian said, i.e. make your intentions clear! If you are just counting the rows, go to COUNT (*) or count one column for the COUNT column (column).

It might be worth checking the database provider as well. Back when I used Informix, I had an optimization for COUNT (*) that had the cost of fulfilling query plan 1 compared to counting single or multiple columns, which would lead to a higher number

+1
Jan 19 '09 at 11:14
source share

if you try SELECT COUNT (1) FROM group_relations, it will be a little faster because it will not try to extract information from your columns.

COUNT (1) is used faster than COUNT (*), but it is not, because modern DBMSs are smart enough to know that you do not want to know about columns

+1
Jan 19 '09 at 11:44
source share

The advice I got from MySQL about such things is that generally trying to optimize a query based on tricks like this can be a scourge in the long run. There are examples in the history of MySQL in which someone using a high-performance method based on how the optimizer works becomes a bottleneck in the next version.

Write a query that answers the question you asked - if you want to count all the lines, use COUNT (*). If you want to count non-zero columns, use COUNT (col) WHERE col NOT NOT. Index accordingly and leave optimizer optimizer. Trying to do your own query-level optimization can sometimes make the built-in optimizer less efficient.

However, there are things that you can do in the query to make it easier for the optimizer to speed up it, but I do not believe COUNT is one of them.

Edit: The statistics in the answer above are interesting. I'm not sure if there is anything in the optimizer work in this case. I'm just talking about query level optimization in general.

+1
Mar 27 '09 at 19:28
source share

I know that it is generally a bad idea to make such requests:

 SELECT * FROM `group_relations` 

But when I just want a count, I use this query because it allows the table to change, but still gives the same results.

 SELECT COUNT(*) FROM `group_relations` 

As your question suggests, the reason SELECT * not recommended, as changes to the table may require changes to your code. This does not apply to COUNT(*) . Quite rarely, specialized behavior is required that gives SELECT COUNT('group_id') - as a rule, you want to know the number of records. Why COUNT(*) needed, so use it.

0
Jun 22 '09 at 23:01
source share



All Articles