Here is a complete working approach for Redshift.
Assumptions:
A. Parts available in S3 in gunzip format with columns highlighted '| may contain some garbage data, see maxerror .
B. A fact is sold with two dimension tables to simplify it (TIME and SKU (SKU can have many groups and categories))).
C. You have a sales table like this.
CREATE TABLE sales ( sku_id int encode zstd, date_id int encode zstd, quantity numeric(10,2) encode delta32k, );
1) Create an intermediate event table that should resemble your Online table used by the application / applications.
CREATE TABLE stg_sales_onetime ( sku_number varchar(255) encode zstd, time varchar(255) encode zstd, qty_str varchar(20) encode zstd, quantity numeric(10,2) encode delta32k, sku_id int encode zstd, date_id int encode zstd );
2) Copy data from S3 (this can be done using SSH).
copy stg_sales_onetime (sku_number,time,qty_str) from 's3://<buecket_name>/<full_file_path>' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' delimiter '|' ignoreheader 1 maxerror as 1000 gzip;
3) This step is optional, if you do not have good formatted data, this is your transition step, if necessary (like converting String (12.555654) to a number (12.56))
update stg_sales_onetime set quantity=convert(decimal(10,2),qty_str);
4) Filling in the correct identifiers from the measurement table.
update stg_sales_onetime set sku_id=<your_sku_demesion_table>.sku_id from <your_sku_demesion_table> where stg_sales_onetime.sku_number=<your_sku_demesion_table>.sku_number; update stg_sales_onetime set time_id=<your_time_demesion_table>.time_id from <your_time_demesion_table> where stg_sales_onetime.time=<your_time_demesion_table>.time;
5) Finally, you have data that can be transferred from the "Stage to online sales" table.
insert into sales(sku_id,time_id,quantity) select sku_id,time_id,quantity from stg_sales_onetime;