BigQuery - remove an unused column from a schema

Question

BigQuery - remove an unused column from a schema

I accidentally added the wrong column to my BigQuery table table.

Instead of reloading the full table (millions of rows), I would like to know if the following is possible:

delete bad rows (rows with values contain the wrong column) by running the query "select *" in a table with some kind of filter and saving the result in the same table.
delete an unused column (now).

Is this functionality supported (or similar)? Perhaps the function "save the result to a table" may have a "compact layout".

+6

google-bigquery

Lior Feb 15 '16 at 10:08

source share

3 answers

Mikhail Berlyant · Answer 1 · 2016-02-15T19:27:33+0000

If your table does not consist of record / retype fields - your simple option:

Select valid columns when filtering invalid records in a new temporary table
SELECT <list of source columns>
FROM YourTable
WHERE <filter to remove bad entries here>
Record up to temp table - YourTable_Temp
Back up your broken table - YourTable_Backup
Remove YourTable
Copy YourTable_Temp to YourTable
Make sure everything looks as expected, and if so, get rid of temporary and backup tables

Please note: the cost above # 1 is exactly the same as the action in the first pool in your question. The remaining actions (copy) are free.

If you have repeated / field entries - you can still execute above the plan, but in # 1 you will need to use the BigQuery User-Defined Functions to have the correct output scheme
You can see examples below - of course, this will require several additional developers, but if you are in a critical situation, this should work for you.

Create a table with a record type column
create table with column record type

Hopefully, at some point, the Google BigQuery team will add better support for cases like yours, when you need to manipulate and display repeated / written data, but at the moment this is the best workaround I have found - at least for myself.

Pentium10 · Answer 2 · 2016-02-15T10:33:09+0000

Keeping the results in a table is your way. Try a large table with selected columns that you are interested in, and you can apply a constraint to make it small.

George · Answer 3 · 2017-01-20T18:08:28+0000

The 5 steps in the first answer above the answer should work well. In detail, you must specify allowLargeResults: true and flattenSchema: false at point 1. Setting allowLargeResults to true allows you to get query results greater than 128 MB. The flattenSchema: false parameter stops alignment of duplicate fields in the result.

As an additional result, the query results can be written to the initial table with the writeDisposition parameter set to writeDisposition: WRITE_TRUNCATE.

BigQuery - remove an unused column from a schema

More articles: