How to read multiple xlsx files in R using a loop with specific rows and columns

I need to read multiple xlsx files with random names into a single data file. The structure of each file is the same. I need to import only certain columns.

I tried this:

dat <- read.xlsx("FILE.xlsx", sheetIndex=1, 
                  sheetName=NULL, startRow=5, 
                  endRow=NULL, as.data.frame=TRUE, 
                  header=TRUE)

But this is only for one file at a time, and I could not specify my specific columns. I even tried:

site=list.files(pattern='[.]xls')

but after that the loop does not work. How to do it? Thanks in advance.

0
source share
3 answers

I would read every sheet in the list:

Get file names:

f = list.files("./")

Reading files:

dat = lapply(f, function(i){
    x = read.xlsx(i, sheetIndex=1, sheetName=NULL, startRow=5,
        endRow=NULL, as.data.frame=TRUE, header=T)
    # Get the columns you want, e.g. 1, 3, 5
    x = x[, c(1, 3, 5)]
    # You may want to add a column to say which file they're from
    x$file = i
    # Return your data
    x
})

Then you can access the items in your list:

dat[[1]]

Or do the same task with them:

lapply(dat, colmeans)

( ):

dat = do.call("rbind.data.frame", dat)
+2

for, .

filelist <- list.files(pattern = "\\.xlsx") # xlsx

allxlsx.files <- list()  # create a list to populate with xlsx data (if you wind to bind all the rows together)
count <- 1
for (file in filelist) {
   dat <- read.xlsx(file, sheetIndex=1, 
              sheetName=NULL, startRow=5, 
              endRow=NULL, as.data.frame=TRUE, 
              header=TRUE) [c(5:10, 12,15)] # index your columns of interest
   allxlsx.files[[count]] <-dat # creat a list of rows from xls files
   count <- count + 1
}

data.frame

allfiles <- do.call(rbind.data.frame, allxlsx.files)
0

For Wyldsoul's answer option, but using a for loop for multiple Excel worksheets (between 1 and j) in the same Excel file and dplyr bindings:

library(gdata) 
library(dplyr)

for (i in 1:j) {
  dat <- read.xls(f, sheet = i) 
  dat <- dat[,1:14] # index your columns of interest
  allxlsx.files[[count]]
  count <- count + 1
}

allfiles <- do.call(bind_rows, allxlsx.files)
0
source

All Articles