Import large xlsx file into R?

I am wondering if anyone knows of a way to import data from a "large" xlsx file (~ 20 Mb). I tried using the xlsx and XLConnect libraries. Unfortunately, both use rJava, and I always get the same error:

> library(XLConnect) > wb <- loadWorkbook("MyBigFile.xlsx") Error: OutOfMemoryError (Java): Java heap space 

or

 > library(xlsx) > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space 

I also tried changing java.parameters before loading rJava:

 > options( java.parameters = "-Xmx2500m") > library(xlsx) # load rJava > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space 

or after loading rJava (this is a bit silly, I think):

 > library(xlsx) # load rJava > options( java.parameters = "-Xmx2500m") > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space 

But nothing works. Anyone have an idea?

+50
r excel xlsx
02 Oct '13 at
source share
7 answers

I came across this question when someone sent me (another) Excel file for analysis. This one is not even so big, but for some reason I came across a similar error:

 java.lang.OutOfMemoryError: GC overhead limit exceeded 

Based on @Dirk Eddelbuettel's comment in the previous answer, I installed the openxlsx package ( http://cran.r-project.org/web/packages/openxlsx/ ). and then ran:

 library("openxlsx") mydf <- read.xlsx("BigExcelFile.xlsx", sheet = 1, startRow = 2, colNames = TRUE) 

That was what I was looking for. Easy to use and angry fast. This is my new BFF. Thanks for the hint @Dirk E!

By the way, I do not want to reject this answer from Dirk E, so if he sends an answer, please accept it, not mine!

+93
Aug 14 '14 at 21:28
source share
 options(java.parameters = "-Xmx2048m") ## memory set to 2 GB library(XLConnect) 

allow large memory using "parameters" before loading any java component. Then download the XLConnect library (it uses java).

What is it. Start reading data with readWorksheet .... and so on. :)

+9
Jan 13 '16 at 17:26
source share

I also had the same error in both xlsx::read.xlsx and XLConnect::readWorksheetFromFile . Perhaps you can use RODBC::odbcDriverConnect and RODBC::sqlFetch , which uses Microsoft RODBC, which is much more efficient.

+3
Jun 24 '15 at 14:25
source share

As mentioned in the canonical question Excel-> R , the recent alternative that came from the readxl package, which I found pretty quickly, compared to, for example, openxlsx and xlsx .

However, there is a certain limit to the size of the spreadsheet, which is why you are probably best off saving the thing as .csv and using fread .

+3
Jul 30 '15 at 22:52
source share

I agree with @orville Jackson's answer and it really helped me.

Embed the answer provided by @orville jackson. here is a detailed description of how you can use openxlsx to read and write large files.

When the data size is small, R has many packages and functions that can be used to suit your requirements.

write.xlsx, write.xlsx2, XLconnect also do the job, but sometimes they are slow compared to openxlsx.

So, if you are dealing with large data sets and have encountered Java errors. I would suggest looking at "openxlsx", which is really awesome and cutting time by 1 / 12th.

I tested everything, and finally, I was impressed with the performance capabilities of openxlsx.

Below are the steps to write multiple datasets onto multiple sheets.

 install.packages("openxlsx") library("openxlsx") start.time <- Sys.time() # Creating large data frame x <- as.data.frame(matrix(1:4000000,200000,20)) y <- as.data.frame(matrix(1:4000000,200000,20)) z <- as.data.frame(matrix(1:4000000,200000,20)) # Creating a workbook wb <- createWorkbook("Example.xlsx") Sys.setenv("R_ZIPCMD" = "C:/Rtools/bin/zip.exe") ## path to zip.exe 

Sys.setenv ("R_ZIPCMD" = "C: /Rtools/bin/zip.exe") must be static, as it requires a link to some utility from Rtools.

Note. Incase Rtools is not installed on your system, first install it to ensure a smooth transition. here is the link for your reference: (select the appropriate version) https://cran.r-project.org/bin/windows/Rtools/

check the settings in accordance with the link below (you need to check the box during installation) https://cloud.githubusercontent.com/assets/7400673/12230758/99fb2202-b8a6-11e5-82e6-836159440831.png

 # Adding a worksheets : parameters for addWorksheet are 1. Workbook Name 2. Sheet Name addWorksheet(wb, "Sheet 1") addWorksheet(wb, "Sheet 2") addWorksheet(wb, "Sheet 3") # Writing data in to respetive sheets: parameters for writeData are 1. Workbook Name 2. Sheet index/ sheet name 3. dataframe name writeData(wb, 1, x) # incase you would like to write sheet with filter available for ease of access you can pass the parameter withFilter = TRUE in writeData function. writeData(wb, 2, x = y, withFilter = TRUE) ## Similarly writeDataTable is another way for representing your data with table formatting: writeDataTable(wb, 3, z) saveWorkbook(wb, file = "Example.xlsx", overwrite = TRUE) end.time <- Sys.time() time.taken <- end.time - start.time time.taken 

the openxlsx package is really good for reading and writing huge data from / to excel files and has many options for custom formatting in excel.

An interesting fact is that we should not worry about the memory of the java heap here.

+3
Mar 30 '17 at 12:56 on
source share

@flodel's suggestion for converting to CSV seems the simplest. If for some reason this is not an option, you can read the pieces in the file:

  require(XLConnect) chnksz <- 2e3 s <- <sheet> wb <- loadWorkbook(<file>, s) tot.rows <- getLastRow(wb) last.row =0 for (i in seq(ceiling( tot.rows / chnksz) )) { next.batch <- readWorksheet(wb, s, startRow=last.row+i, endRow=last.row+chnksz+i) # optionally save next.batch to disk or # assign it to a list. See which works for you. } 
+2
Oct 03 '13 at 0:34
source share

I found this thread, looking for the answer to the same question. Instead of trying to crack the xlsx file from R, I ended up having to convert the file to .csv using python, and then import the file into R using the standard scan function.

Check out: https://github.com/dilshod/xlsx2csv

0
Jun 19 '14 at
source share



All Articles