Bulk update existing rows in Redshift

It seems to be easy, but it is not. I am transferring a query from MySQL to a Redshift form:

INSERT INTO table (...) VALUES (...) ON DUPLICATE KEY UPDATE value = MIN(value, VALUES(value)) 

For the primary keys that we insert that are not yet in the table, they have just been inserted. For primary keys that are already in the table, we update the row values ​​based on a condition that depends on existing and new values ​​in the row.

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html does not work, because filter_expression in my case depends on the current records in the table. I am currently creating a staging table by inserting a COPY statement into it and trying to find a better way to merge staging and real tables.

+7
sql postgresql amazon-redshift
source share
3 answers

I need to do just that for the project right now. The method that I use includes 3 steps:

one.

Run an update that shows the changed fields (I am updating to see if the fields have changed, but you can certainly do this):

 update table1 set col1=s.col1, col2=s.col2,... from table1 t join stagetable s on s.primkey=t.primkey; 

2.

Run the insert that accesses the new entries:

 insert into table1 select s.* from stagetable s left outer join table1 t on s.primkey=t.primkey where t.primkey is null; 

3.

Mark the lines no longer in the source as inactive (our reporting tool uses views that filter inactive records):

 update table1 set is_active_flag='N', last_updated=sysdate from table1 t left outer join stagetable s on s.primkey=t.primkey where s.primkey is null; 
+8
source share

It is possible to create a temporary table. In redshift, it is better to delete and insert a record. Check out this document.

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html

0
source share

Here is a complete working approach for Redshift.

Assumptions:

A. Parts available in S3 in gunzip format with columns highlighted '| may contain some garbage data, see maxerror .

B. A fact is sold with two dimension tables to simplify it (TIME and SKU (SKU can have many groups and categories))).

C. You have a sales table like this.

 CREATE TABLE sales ( sku_id int encode zstd, date_id int encode zstd, quantity numeric(10,2) encode delta32k, ); 

1) Create an intermediate event table that should resemble your Online table used by the application / applications.

 CREATE TABLE stg_sales_onetime ( sku_number varchar(255) encode zstd, time varchar(255) encode zstd, qty_str varchar(20) encode zstd, quantity numeric(10,2) encode delta32k, sku_id int encode zstd, date_id int encode zstd ); 

2) Copy data from S3 (this can be done using SSH).

 copy stg_sales_onetime (sku_number,time,qty_str) from 's3://<buecket_name>/<full_file_path>' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' delimiter '|' ignoreheader 1 maxerror as 1000 gzip; 

3) This step is optional, if you do not have good formatted data, this is your transition step, if necessary (like converting String (12.555654) to a number (12.56))

 update stg_sales_onetime set quantity=convert(decimal(10,2),qty_str); 

4) Filling in the correct identifiers from the measurement table.

 update stg_sales_onetime set sku_id=<your_sku_demesion_table>.sku_id from <your_sku_demesion_table> where stg_sales_onetime.sku_number=<your_sku_demesion_table>.sku_number; update stg_sales_onetime set time_id=<your_time_demesion_table>.time_id from <your_time_demesion_table> where stg_sales_onetime.time=<your_time_demesion_table>.time; 

5) Finally, you have data that can be transferred from the "Stage to online sales" table.

 insert into sales(sku_id,time_id,quantity) select sku_id,time_id,quantity from stg_sales_onetime; 
0
source share

All Articles