For this conversation, Postgres = RedShift. You have two options:
Option 1:
From Pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql
The pandas.io.sql module provides a set of query wrappers to facilitate data retrieval and reduce dependency on a database-specific API. Database abstraction is provided by SQLAlchemy, if installed. In addition, you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL.
Writing DataFrames
Assuming the following data is in a DataFrame, we can insert it into the database using to_sql ().
id Date Col_1 Col_2 Col_3 26 2012-10-18 X 25.7 True 42 2012-10-19 Y -12.4 False 63 2012-10-20 Z 5.73 True In [437]: data.to_sql('data', engine)
In some databases, writing large DataFrames can lead to errors due to exceeding packet size limits. This can be avoided by setting the chunksize parameter when calling to_sql. For example, the following writes data to the database in batches of 1000 rows at a time:
In [438]: data.to_sql('data_chunked', engine, chunksize=1000)
Option 2
Or you can just make your own. If you have a dataframe called data, just iterate over it using iterrows:
for row in data.iterrows():
then add each row to your database. I would use a copy instead of an insert for each row, as it will be much faster.
http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from
source share