How to load data into Amazon Redshift via Python Boto3?

Question

How to load data into Amazon Redshift via Python Boto3?

The Amazon Redshift Getting Started Guide retrieves data from Amazon S3 and loads it into an Amazon Redshift cluster using SQLWorkbench / J. I would like to reproduce the same process of connecting to a cluster and upload sample data to the cluster using Boto3 .

However, in the Boto3 documentation in Redshift, I cannot find a method that would allow me to load data into an Amazon Redshift cluster.

I managed to connect to Redshift using Boto3 with the following code:

client = boto3.client('redshift')

But I'm not sure which method would allow me to create tables or load data into Amazon Redshift, as was done in the tutorial with SQLWorkbenchJ .

+7

python amazon-s3 amazon-web-services amazon-redshift boto3

Chris Jan 24 '16 at 23:47

source share

2 answers

That's right, you need the psycopg2 Python module to execute the COPY command.

My code is as follows:

 import psycopg2 #Amazon Redshift connect string conn_string = "dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'" #connect to Redshift (database should be open to the world) con = psycopg2.connect(conn_string); sql="""COPY %s FROM '%s' credentials 'aws_access_key_id=%s; aws_secret_access_key=%s' delimiter '%s' FORMAT CSV %s %s; commit;""" % (to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,delim,quote,gzip) #Here # fn - s3://path_to__input_file.gz # gzip = 'gzip' cur = con.cursor() cur.execute(sql) con.close()

I used boto3 / psycopg2 to write CSV_Loader_For_Redshift

+10

Alex b Mar 29 '16 at 20:26

source share

Mark b · Accepted Answer · 2016-01-25T05:01:21+0000

Return to step 4 in this tutorial that you have linked. See where it shows you how to get the cluster url? You must connect to this URL using the PostgreSQL driver. AWS SDKs, such as Boto3, provide access to the AWS API. You need to connect to Redshift through the PostgreSQL API, just like you connect to a PostgreSQL database in RDS.

How to load data into Amazon Redshift via Python Boto3?

More articles: