How to Turn into Google BigQuery

Suppose I have the following request sent to BQ:

SELECT shipmentID, category, quantity FROM [myDataset.myTable] 

In addition, suppose the query returns data such as:

 shipmentID category quantity 1 shoes 5 1 hats 3 2 shirts 1 2 hats 2 3 toys 3 2 books 1 3 shirts 1 

How can I modify the results from within BQ to produce the output as follows:

  shipmentID shoes hats shirts toys books 1 5 3 0 0 0 2 0 2 1 0 1 3 0 0 1 3 0 

As some additional background, I actually have categories with over 2000+ that I need to expand, and the amount of data is such that I cannot do it directly through the Pandas DataFrame in Python (uses all memory, then slows down the scan). I tried using a relational database, but ran into a column limit, so I would like to be able to do this directly in BQ, even if I need to build the query itself through python. Any suggestions?

** Edit 1 It should be noted that the rotation of the data itself can be performed in pieces and, therefore, is not a problem. The real problem is trying to do aggregation afterwards, so I only have one row for each delivery ID. This is what eats all the RAM.

** Modify 2 Having selected the accepted answer below, I found that an attempt to use it to create a pivot table of column 2k + is causing "Resources Exceeded" errors. My BQ team was able to reorganize the request in order to break it into smaller pieces and let it pass. The basic structure of the request is as follows:

 SELECT SetA.*, SetB.*, SetC.* FROM ( SELECT shipmentID, SUM(IF (category="Rocks", qty, 0)), SUM(IF (category="Paper", qty, 0)), SUM(IF (category="Scissors", qty, 0)) FROM ( SELECT a.shipmentid shipmentid, a.quantity quantity, a.category category FROM [myDataset.myTable] a) GROUP EACH BY shipmentID ) SetA INNER JOIN EACH ( SELECT shipmentID, SUM(IF (category="Jello Molds", quantity, 0)), SUM(IF (category="Torque Wrenches", quantity, 0)) FROM ( SELECT a.shipmentID shipmentID, a.quantity quantity, a.category category FROM [myDataset.myTable] a) GROUP EACH BY shipmentID ) SetB ON SetA.shipmentid = SetB.shipmentid INNER JOIN EACH ( SELECT shipmentID, SUM(IF (category="Deep Thoughts", qty, 0)), SUM(IF (category="Rainbows", qty, 0)), SUM(IF (category="Ponies", qty, 0)) FROM ( SELECT a.shipmentid shipmentid, a.quantity quantity, a.category category FROM [myDataset.myTable] a) GROUP EACH BY shipmentID ) SetC ON SetB.shipmentID = SetC.shipmentID 

The above pattern can be continued indefinitely by adding INNER JOIN EACH segments one by one. For my application, BQ was able to process about 500 columns per piece.

+5
source share
1 answer

This is the way:

 select shipmentID, sum(IF (category='shoes', quantity, 0)) AS shoes, sum(IF (category='hats', quantity, 0)) AS hats, sum(IF (category='shirts', quantity, 0)) AS shirts, sum(IF (category='toys', quantity, 0)) AS toys, sum(IF (category='books', quantity, 0)) AS books, from (select 1 as shipmentID, 'shoes' as category, 5 as quantity), (select 1 as shipmentID, 'hats' as category, 3 as quantity), (select 2 as shipmentID, 'shirts' as category, 1 as quantity), (select 2 as shipmentID, 'hats' as category, 2 as quantity), (select 3 as shipmentID, 'toys' as category, 3 as quantity), (select 2 as shipmentID, 'books' as category, 1 as quantity), (select 3 as shipmentID, 'shirts' as category, 1 as quantity), group by shipmentID 

This returns:

 +-----+------------+-------+------+--------+------+-------+---+ | Row | shipmentID | shoes | hats | shirts | toys | books | | +-----+------------+-------+------+--------+------+-------+---+ | 1 | 1 | 5 | 3 | 0 | 0 | 0 | | | 2 | 2 | 0 | 2 | 1 | 0 | 1 | | | 3 | 3 | 0 | 0 | 1 | 3 | 0 | | +-----+------------+-------+------+--------+------+-------+---+ 

See the manual for another example pivot table .

+5
source

All Articles