Transpose rows and columns (aka pivot) with only a minimum COUNT () value?

Question

Transpose rows and columns (aka pivot) with only a minimum COUNT () value?

Here is my tab_test table:

year animal price 2000 kittens 79 2000 kittens 93 2000 kittens 100 2000 puppies 15 2000 puppies 32 2001 kittens 31 2001 kittens 17 2001 puppies 65 2001 puppies 48 2002 kittens 84 2002 kittens 86 2002 puppies 15 2002 puppies 95 2003 kittens 62 2003 kittens 24 2003 puppies 36 2003 puppies 41 2004 kittens 65 2004 kittens 85 2004 puppies 58 2004 puppies 95 2005 kittens 45 2005 kittens 25 2005 puppies 15 2005 puppies 35 2006 kittens 50 2006 kittens 80 2006 puppies 95 2006 puppies 49 2007 kittens 40 2007 kittens 19 2007 puppies 81 2007 puppies 38 2008 kittens 37 2008 kittens 51 2008 puppies 29 2008 puppies 72 2009 kittens 84 2009 kittens 26 2009 puppies 49 2009 puppies 34 2010 kittens 75 2010 kittens 96 2010 puppies 18 2010 puppies 26 2011 kittens 35 2011 kittens 21 2011 puppies 90 2011 puppies 18 2012 kittens 12 2012 kittens 23 2012 puppies 74 2012 puppies 79

Here is some code that wraps rows and columns, so I get the average value for “kittens” and “puppies”:

 SELECT year, AVG(CASE WHEN animal = 'kittens' THEN price END) AS "kittens", AVG(CASE WHEN animal = 'puppies' THEN price END) AS "puppies" FROM tab_test GROUP BY year ORDER BY year;

The output for the code above:

  year kittens puppies 2000 90.6666666666667 23.5 2001 24.0 56.5 2002 85.0 55.0 2003 43.0 38.5 2004 75.0 76.5 2005 35.0 25.0 2006 65.0 72.0 2007 29.5 59.5 2008 44.0 50.5 2009 55.0 41.5 2010 85.5 22.0 2011 28.0 54.0 2012 17.5 76.5

I would like the table to be the same as the second, but it would contain only those elements that have COUNT() at least 3 in the first table. In other words, the goal is to get this as a result:

 year kittens 2000 90.6666666666667

The first table had at least 3 “kitten” specimens.
Is this possible in PostgreSQL?

+7

sql postgresql pivot crosstab

user1626730 Oct 31 '12 at 21:59

source share

4 answers

`CASE`

If your case is simple, as shown, the CASE statement will do:

 SELECT year , sum(CASE WHEN animal = 'kittens' THEN price END) AS kittens , sum(CASE WHEN animal = 'puppies' THEN price END) AS puppies FROM ( SELECT year, animal, avg(price) AS price FROM tab_test GROUP BY year, animal HAVING count(*) > 2 ) t GROUP BY year ORDER BY year;

It doesn't matter if you use sum() , max() or min() as an aggregate function in an external query. All of them lead to the same value in this case.

SQL Fiddle

`crosstab()`

With more categories, this will be easier with the crosstab() query. It should also be faster for large tables.

You need to install the additional tablefunc module (once for each database). Since Postgres 9.1 is as simple as:

 CREATE EXTENSION tablefunc;

Details in this related answer:

PostgreSQL crosstab query

 SELECT * FROM crosstab( 'SELECT year, animal, avg(price) AS price FROM tab_test GROUP BY animal, year HAVING count(*) > 2 ORDER BY 1,2' ,$$VALUES ('kittens'::text), ('puppies')$$) AS ct ("year" text, "kittens" numeric, "puppies" numeric);

There is no sqlfiddle for this, because the site does not allow additional modules.

Benchmark

To check my claims, I conducted a quick test with close to real data in my small test database. PostgreSQL 9.1.6. Test with EXPLAIN ANALYZE , the best of 10:

Test setup with 10020 lines:

 CREATE TABLE tab_test (year int, animal text, price numeric); -- years with lots of rows INSERT INTO tab_test SELECT 2000 + ((g + random() * 300))::int/1000 , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END , (random() * 200)::numeric FROM generate_series(1,10000) g; -- .. and some years with only few rows to include cases with count < 3 INSERT INTO tab_test SELECT 2010 + ((g + random() * 10))::int/2 , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END , (random() * 200)::numeric FROM generate_series(1,20) g;

Results:

@bluefeet
Total Run Time: 95.401 ms

@wildplasser (different results include lines with count <= 3 )
Total Run Time: 64.497 ms

@Andreiy (+ ORDER BY )
& Amp; @ Erwin1 - CASE (both work roughly the same)
Total Run Time: 39.105 ms

@ Erwin2 - crosstab()
Total Run Time: 17.644 ms

A largely proportional (but irrelevant) result has only 20 lines. @Wildplasser CTE alone has a bit more overhead and spikes.

With more than a few lines, crosstab() quickly takes the lead. The @Andreiy query does much the same as my simplified version, the aggregate function in the external SELECT ( min() , max() , sum() ) does not produce a measurable difference (only two lines per group).

All as expected, no surprises, take my setup and try @home.

+11

Erwin brandstetter Oct 31 '12 at 23:40

source share

This is what you are looking for:

 SELECT t1.year, AVG(CASE WHEN t1.animal = 'kittens' THEN t1.price END) AS "kittens", AVG(CASE WHEN t1.animal = 'puppies' THEN t1.price END) AS "puppies" FROM tab_test t1 inner join ( select animal, count(*) YearCount, year from tab_test group by animal, year ) t2 on t1.animal = t2.animal and t1.year = t2.year where t2.YearCount >= 3 group by t1.year

See SQL Fiddle with Demo

+3

Taryn Oct 31 '12 at 10:05

source share

 CREATE TABLE pussyriot(year INTEGER NOT NULL , animal varchar , price integer ); INSERT INTO pussyriot(year , animal , price ) VALUES (2000, 'kittens', 79) , (2000, 'kittens', 93) ... , (2007, 'puppies', 81) , (2007, 'puppies', 38) ; -- a self join is a poor man pivot: WITH cal AS ( -- generate calendar file SELECT generate_series(MIN(pr.year) , MAX(pr.year)) AS year FROM pussyriot pr ) , fur AS ( SELECT distinct year, animal, AVG(price) AS price FROM pussyriot GROUP BY year, animal -- UPDATE: added next line HAVING COUNT(*) >= 3 ) SELECT cal.year , pussy.price AS price_of_the_pussy , puppy.price AS price_of_the_puppy FROM cal LEFT JOIN fur pussy ON pussy.year=cal.year AND pussy.animal='kittens' LEFT JOIN fur puppy ON puppy.year=cal.year AND puppy.animal='puppies' ;

+2

wildplasser Oct 31 '12 at 10:55

source share

Andriy m · Accepted Answer · 2012-10-31T22:32:03+0000

Here's an alternative to @bluefeet suggestion , which is somewhat similar but avoids combining (instead, top-level grouping applies to an already grouped result set):

 SELECT year, MAX(CASE animal WHEN 'kittens' THEN avg_price END) AS "kittens", MAX(CASE animal WHEN 'puppies' THEN avg_price END) AS "puppies" FROM ( SELECT animal, year, COUNT(*) AS cnt, AVG(Price) AS avg_price FROM tab_test GROUP BY animal, year ) s WHERE cnt >= 3 GROUP BY year ;

Transpose rows and columns (aka pivot) with only a minimum COUNT () value?

CASE

crosstab()

Benchmark

More articles:

`CASE`

`crosstab()`