Pivot for redshift database

Question

Pivot for redshift database

I know that this question was asked before, but any of the answers did not help me satisfy my desired requirements. So asking a question in a new chain

In redshift, how you can use data rotation in the form of one row for each unique set of dimensions, for example:

id Name Category count 8660 Iced Chocolate Coffees 105 8660 Iced Chocolate Milkshakes 10 8662 Old Monk Beer 29 8663 Burger Snacks 18

to

 id Name Cofees Milkshakes Beer Snacks 8660 Iced Chocolate 105 10 0 0 8662 Old Monk 0 0 29 0 8663 Burger 0 0 0 18

The categories listed above continue to change. Redshift does not support the pivot operator, and the case expression will not have much support (unless you suggest how to do it)

How can I achieve this result at redshift?

(The above is just an example, we would have 1000+ categories, and these categories continue to change)

+8

sql amazon-redshift pivot

ankitkhanduri Mar 09 '17 at 11:24

source share

3 answers

Sami yabroudi · Answer 1 · 2018-12-12T16:11:54+0000

We work a lot at Ro - we created a Python tool for automatically generating summary queries. This tool allows you to use the same basic parameters as in Excel, including specifying aggregation functions, as well as whether you want to use common aggregates.

user3600910 · Answer 2 · 2017-03-09T13:41:38+0000

I don’t think there is an easy way to do this in Redshift,

also you say that you have more than 1000 categories, and this number is growing, and you need to take into account that you have a limit of 1600 columns per table,

see the attached link [ http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html†[1]

You can use a case, but then you need to create a case for each category

 select id, name, sum(case when Category='Coffees' then count) as Cofees, sum(case when Category='Milkshakes' then count) as Milkshakes, sum(case when Category='Beer' then count) as Beer, sum(case when Category='Snacks' then count) as Snacks from my_table group by 1,2

Another option is to load the table, for example, into R, and then, for example, use the cast function.

 cast(data, name~ category)

and then upload the data back to S3 or Redshift

systemjack · Answer 3 · 2017-03-09T17:31:44+0000

If you usually want to request specific subsets of categories from a pivot table, there may be a workaround based on the comment approach.

You can fill in your "pivot_table" from the original like this:

 insert into pivot_table (id, Name, json_cats) ( select id, Name, '{' || listagg(quote_ident(Category) || ':' || count, ',') within group (order by Category) || '}' as json_cats from to_pivot group by id, Name )

And access to certain categories this way:

 select id, Name, nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks, nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer from pivot_table

Using varchar(max) for the JSON column type will give 65535 bytes, which should be space for several thousand categories.

Pivot for redshift database

More articles: