I have many (millions) small log files in s3 with a name (date / time) to help identify it, i.e. servername-yyyy-mm-dd-HH-MM. eg.
s3://my_bucket/uk4039-2015-05-07-18-15.csv s3://my_bucket/uk4039-2015-05-07-18-16.csv s3://my_bucket/uk4039-2015-05-07-18-17.csv s3://my_bucket/uk4039-2015-05-07-18-18.csv ... s3://my_bucket/uk4339-2015-05-07-19-23.csv s3://my_bucket/uk4339-2015-05-07-19-24.csv ... etc
From EC2, using AWS CLI , I would like to simultaneously download all files that have a minute of 16 for 2015, for all only the uk4339 and uk4338 server
Is there any reasonable way to do this?
Also, if this is a terrible file structure in s3 for querying data, I would be extremely grateful for any advice on how to install this better.
I can put the appropriate aws s3 cp ... command in a loop in the / bash shell to load the corresponding files sequentially, but I was wondering if something was more efficient.
As an added bonus, which I would like to tie together, the results are also combined as one csv.
A quick example mock csv file can be generated in R using this line of R code
R> write.csv(data.frame(cbind(a1=rnorm(100),b1=rnorm(100),c1=rnorm(100))),file='uk4339-2015-05-07-19-24.csv',row.names=FALSE)
Created by csv uk4339-2015-05-07-19-24.csv . FYI, I will import the combined data into R at the end.