I currently use Amazon Redshift to store aggregated data from 50 - 100 GB (i.e. millions of lines) of tab delimited files that are pushed into the Amazon S3 bucket every day.
Redshift makes this simpler by providing a copy command that can be routed directly to the S3 bucket for bulk data loading.
I would like to use Amazon Aurora RDS for the same purpose. The documentation on Aurora is now thin, at best. Is there a way to bulk upload directly from S3 to Aurora?
As far as I can tell, MySql LOAD DATA INFILE requires a path to the file on disk, which I suppose I can get around by loading tsv into an AWS instance and running the command from there, although this is not ideal.
I also tried to read the CA into memory and build some insert . This is clearly slow and clumsy.
Ideas?
UPDATE 11/2016:
In Aurora version 1.8, you can now use the following commands to bulk load S3 data:
LOAD DATA FROM S3
or
LOAD XML FROM S3
Download Aurora from S3
source share