How to save the last processed S3 file to the Redshift database

At the moment, I copied data from Amazon S3 to Amazon Redshift using AWS Data Pipeline for the current date and time only. I want to copy data from S3 to Redshift every 30 minutes. And also the last processed S3 file name is stored in another Redshift table.

Can anyone answer this question?

+4
source share
1 answer

You can use the RedshiftCopyActivitydata pipeline object to do just that. The field schedulein the object RedshiftCopyActivityaccepts the schedule object of the data pipeline , which can operate at 30-minute intervals. You will need to define the complete pipeline in JSON , including all the information about your AWS resource (Redshift data nodes, EC2 instances, bucket and S3 key). The path to the source data file in the JSON template may point to a static file that is overwritten every 30 minutes by what produces the data.

+2
source

All Articles