Frequency distribution SQL query for counting ranges with group inclusion and counting 0

Given:

table 'thing': age --- 3.4 3.4 10.1 40 45 49 

I want to count the number of things for each 10 year range, for example,

 age_range | count ----------+------- 0 | 2 10| 1 20| 0 30| 0 40| 3 

This request is close:

 SELECT FLOOR(age / 10) as age_range, COUNT(*) FROM thing GROUP BY FLOOR(age / 10) ORDER BY FLOOR(age / 10); 

Output:

  age_range | count -----------+------- 0 | 1 1 | 2 4 | 3 

However, it does not show ranges that have 0 samples. How can I modify the query so that it also shows ranges between 0 values?

I found similar stacking questions for counting ranges, some for counting 0, but they are related to the need to specify each range (either hard code the ranges in the query, or put ranges in a table). I would prefer to use a general query like the one above, where I do not need to explicitly specify each range (e.g. 0-10, 10-20, 20-30, ...). I am using PostgreSQL 9.1.3.

Is there a way to change the simple query above to include 0 score?

Similar:
Oracle: how is "group by" in a range?
Get frequency distribution of decimal range in MySQL

+8
sql aggregate-functions group-by postgresql
source share
2 answers

generate_series to the rescue:

 select 10 * sd, count(t.age) from generate_series(0, 10) s(d) left outer join thing t on sd = floor(t.age / 10) group by sd order by sd 

The upper bound output for generate_series should be trivial with a separate request, I just used 10 as a placeholder.

It:

 generate_series(0, 10) s(d) 

essentially creates an inline table named s with a single column d that contains values ​​from 0 to 10 (inclusive).

You can wrap two queries (one to determine the range, one to calculate the counters), if necessary.

+10
source share

You need to somehow come up with a table of age ranges. A number of rooms usually work beautifully. Make a Cartesian product against a large table to get a lot of numbers.

 WITH RANGES AS ( SELECT (rownum - 1) * 10 AS age_range FROM ( SELECT row_number() OVER() as rownum FROM pg_tables ) n ,( SELECT ceil( max(age) / 10 ) range_end FROM thing ) m WHERE n. rownum <= range_end ) SELECT r.age_range, COUNT(t.age) AS count FROM ranges r LEFT JOIN thing t ON r.age_range = FLOOR(t.age / 10) * 10 GROUP BY r.age_range ORDER BY r.age_range; 

EDIT: mu is too short, has a more elegant answer, but if you did not have the generate_series function on db, ... :)

+1
source share

All Articles