Reading a large csv file from S3 to R

I need to upload a 3 GB csv file with approximately 18 million rows and 7 columns from S3 to R or RStudio respectively. My code for reading data from S3 usually works as follows:

library("aws.s3")
obj <-get_object("s3://myBucketName/aFolder/fileName.csv")  
csvcharobj <- rawToChar(obj)  
con <- textConnection(csvcharobj)  
data <- read.csv(file = con)

Now that the file is much larger than usual, I get an error message

> csvcharobj <- rawToChar(obj)  
Error in rawToChar(obj) : long vectors not supported yet: raw.c:68

Reading this post , I understand that the vector is too long, but how would I multiply the data in this case? Any other suggestion on how to handle larger files to read from S3?

+9
source share
3 answers

Spark , - / CSV DataTable - R Server/sparklyr

0

, CSV s3

library(aws.s3)

data <- 
  save_object("s3://myBucketName/directoryName/fileName.csv") %>%
  fread()

305 .

CSV , :

data <- 
  save_object("s3://myBucketName/directoryName/fileName.csv",
              file = tempfile(fileext = ".csv")
             ) %>%
  fread()

And if you are interested in where the temporary file is located, then it Sys.getenv()may give some idea - see TMPDIR TEMPor TMP. Further information can be found in the documentation for the temporary file R. The Base .

0
source

All Articles