Why do I need "OR NULL" in MySQL when counting rows with a condition

The question is about the MySQL aggregation function COUNT (), which constantly appears in me. I would like some explanation why it works as it is.

When I started working with MySQL, I quickly found out that its COUNT (condition) only works correctly if the condition also contains OR NULL at the end. In the case of more complex COUNT conditions, it was an empirical process to figure out where to specify it exactly. In MSSQL, you don’t need this OR NULL to get the correct results, so I would like to know an explanation. So here is an example.

It allows you to have a very basic table with the following structure and data:

CREATE TABLE test ( `value` int(11) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO test (value) VALUES(1); INSERT INTO test (value) VALUES(4); INSERT INTO test (value) VALUES(5); INSERT INTO test (value) VALUES(6); INSERT INTO test (value) VALUES(4); INSERT INTO test (value) VALUES(4); INSERT INTO test (value) VALUES(5); INSERT INTO test (value) VALUES(2); INSERT INTO test (value) VALUES(8); INSERT INTO test (value) VALUES(1); 

Scenario: I would like to calculate how many lines I have, where value = 4. The obvious solution would be to filter it with WHERE and do COUNT (*), but I'm interested in the solution-based COUNT condition (condition).

So the solution that comes to my mind:

 SELECT COUNT(value=4) FROM test 

The result is 10. This is obviously not true.

Second attempt with OR NULL:

 SELECT COUNT(value=4 OR NULL) FROM test 

The result is 3. This is correct.

Can someone explain the logic of this? Is this some kind of bug in MySQL or is there a logical explanation why I need to add this weird looking OR NULL to the end of the COUNT clause to get the correct result?

+5
source share
5 answers

That should show everything

 SELECT 4=4, 3=4, 1 or null, 0 or null 

Exit

 1 | 0 | 1 | NULL 

Facts

  • COUNT adds columns / expressions that evaluate NOT NULL. Everything will increase by 1 if it is not equal to zero. The exception is COUNT (DISTINCT), where it increases only if it has not been counted yet.

  • When a BOOLEAN expression is used on its own, it returns 1 or 0.

  • When a boolean value is OR -ed with NULL, it is NULL only if it is 0 (false)

Other

Yes, if counting is the desired ONLY column, you can use WHERE value=4 , but if it is a query that wants to count 4, as well as get other counts / aggregates, the filter does not work. An alternative would be SUM(value=4) , for example.

 SELECT sum(value=4) FROM test 
+12
source
Function

COUNT() accepts an argument that is treated as NULL or NOT NULL . If it is NOT NULL , then it increments the value and does nothing otherwise.

In your case, the expression value=4 is either TRUE or FALSE , it is obvious that both TRUE and FALSE not equal to zero, so you get 10.

but I'm interested in a solution based on COUNT (condition).

A count based solution will always be slower (much slower), because it will cause a table fullscan and iterative comparison of each value.

+5
source

COUNT(expression) counts the number of lines for which the expression is not NULL. The expression value=4 is NULL if the value is NULL, otherwise it is TRUE (1) or FALSE (0), both of which are counted.

 1 = 4 | FALSE 4 = 4 | TRUE 1 = 4 OR NULL | NULL 4 = 4 OR NULL | TRUE 

Instead, you can use SUM:

 SELECT SUM(value=4) FROM test 

This is not particularly useful in your specific example, but it may be useful if you want to count rows matching several different predicates using a single-table table scan, for example, in the following query:

 SELECT SUM(a>b) AS foo, SUM(b>c) AS bar, COUNT(*) AS total_rows FROM test 
+3
source

I would suggest that you find a more standard syntax that improves between different database engines and will always produce the correct result.

  select count(*) from test where value = 4 

Is the syntax used by Mysql variant?

0
source

This is because COUNT (expression) counts VALUES values. In theory, SQL NULL is a STATE, not a VALUE, and therefore it is not taken into account. NULL is a state that means the field value is unknown.

Now when you write "value = 4", it evaluates to a boolean value of TRUE or FALSE. Since TRUE and FALSE are VALUES, the result is 10.

When you add OR NULL, you actually have TRUE OR NULL and FALSE OR NULL. Now "TRUE OR NULL" is set to TRUE, and "FALSE or NULL" is NULL. So the result is 3 because you only have 3 values ​​(and seven NULL states).

0
source

All Articles