Explanation necessary for the absence of rows with left join and counter ()

Can someone please help me understand the following behavior that occurs when I add a WHERE clause to a query that has a LEFT JOIN with COUNT (*)?

I have two tables:

TABLE 1: customers customer_id | name ------------------ 1 | Bob 2 | James 3 | Fred TABLE 2: orders order_id | customer_id | order_timestamp ---------------------------------------- 1000 | 1 | 2011-01-01 00:00 1001 | 1 | 2011-01-05 00:00 1002 | 2 | 2011-01-10 00:00 

Now the following request tells me how many orders each customer placed:

 select c.customer_id, count(o.order_id) from customers c left join orders o using (customer_id) group by 1 customer_id | count ------------------- 1 | 2 2 | 1 3 | 0 

This works fine, but if I add a WHERE clause to the query, the query no longer displays the number of zeros for customers who did not place any orders, even if I do LEFT JOIN:

 select c.customer_id, count(o.order_id) from customers c left join orders o using (customer_id) where o.order_timestamp >= '2011-01-05' group by 1 customer_id | count ------------------- 1 | 1 2 | 1 

Now, if I move the WHERE clause as part of the LEFT JOIN, as shown below, I will return my zero counter for customers who did not place orders:

 select c.customer_id, count(o.order_id) from customers c left join orders o on (c.customer_id = o.customer_id) and (o.order_timestamp >= '2011-01-05') group by 1 

I am confused by why the second request does not work, but the third does? Can someone please give me an explanation? Also not sure if that matters, but I'm using postgres. Thanks!

+7
source share
3 answers

This is because NULL is not greater than or equal to anything; If you change the WHERE clause to where o.order_timestamp is null or o.order_timestamp >= '2011-01-05' , you will get the same behavior as the limit of the join clause.

Please note that - I would recommend the proposal proposal approach, as it more closely matches what you are trying to do. Also, the change to the WHERE clause that I mentioned above will only work if the order_timestamp column is not NULL - if so then you should use another column for null checking (e.g. where o.primarykey is null or o.order_timestamp >= '2011-01-05' ).

+3
source

Defining filtering criteria is important when working with OUTER connections (RIGHT, LEFT). The criteria in the ON clause for INTERNAL COMPOSITION apply to the COMPOUND; the criteria in the WHERE clause applies after the JOIN - applies to the result set that uses the JOIN.

  SELECT c.customer_id, COUNT(o.order_id) FROM CUSTOMERS c LEFT JOIN ORDERS o ON o.customer_id - c.customer_id AND o.order_timestamp >= '2011-01-05' GROUP BY c.customer_id 

ordinals

Ordinals, that is, using a numeric value that refers to the numeric position of columns in a SELECT clause is not recommended. If someone changes the query - say, to add a column - this can dramatically affect your query.

+4
source

Chirs is right, zero is not greater than or equal to something. Therefore, when you include your condition in the where clause, it applies to the final form (table) of the result generated by the left join, in which case your state deletes a row that has a null timestamp.

However, when you apply the same condition during a join, the condition applies only to the order table, and then the left join is executed. Therefore, it does not delete null timestamps.

So, in the third query condition applied before the final table generated and in the second query condition applied after creating the final table

0
source

All Articles