Reading a large csv file from S3 to R

Question

Reading a large csv file from S3 to R

I need to upload a 3 GB csv file with approximately 18 million rows and 7 columns from S3 to R or RStudio respectively. My code for reading data from S3 usually works as follows:

library("aws.s3")
obj <-get_object("s3://myBucketName/aFolder/fileName.csv")  
csvcharobj <- rawToChar(obj)  
con <- textConnection(csvcharobj)  
data <- read.csv(file = con)

Now that the file is much larger than usual, I get an error message

> csvcharobj <- rawToChar(obj)  
Error in rawToChar(obj) : long vectors not supported yet: raw.c:68

Reading this post , I understand that the vector is too long, but how would I multiply the data in this case? Any other suggestion on how to handle larger files to read from S3?

+9

r amazon-s3 csv

Ryan Oct 10 '17 at 13:51

source share

3 answers

Kannaiyan · Answer 1 · 2018-03-20T00:51:52+0000

AWS Athena S3 athena R. r athena .

https://aws.amazon.com/blogs/big-data/running-r-on-amazon-athena/

, .

Ulrich Beck · Answer 2 · 2018-09-30T07:00:11+0000

Spark , - / CSV DataTable - R Server/sparklyr

leerssej · Answer 3 · 2019-09-06T23:18:12+0000

, CSV s3

library(aws.s3)

data <- 
  save_object("s3://myBucketName/directoryName/fileName.csv") %>%
  fread()

305 .

CSV , :

data <- 
  save_object("s3://myBucketName/directoryName/fileName.csv",
              file = tempfile(fileext = ".csv")
             ) %>%
  fread()

And if you are interested in where the temporary file is located, then it Sys.getenv()may give some idea - see TMPDIR TEMPor TMP. Further information can be found in the documentation for the temporary file R. The Base .

Reading a large csv file from S3 to R

More articles: