Repeating functions on sequentially labeled data frames

Question

Repeating functions on sequentially labeled data frames

A question that is undoubtedly easy to solve for expert R.

I need to repeat a series of functions on data frames that are sequentially labeled (before combining them all together). For example, I might need to do the following:

# READ IN DATAFILES & LABEL DF df1 <- read.csv(file="file_A.csv",head=TRUE) df2 <- read.csv(file="file_B.csv",head=TRUE) df3 <- read.csv(file="file_C.csv",head=TRUE) # TURN DF INTO DATA TABLES df1<-data.table(df1) df2<-data.table(df2) df3<-data.table(df3) # CHANGE VARIABLE TO POSIX df1$date <-as.POSIXct(df1$date, format = "%Y-%m-%d %H:%M:%S") df2$date <-as.POSIXct(df2$date, format = "%Y-%m-%d %H:%M:%S") df3$date <-as.POSIXct(df3$date, format = "%Y-%m-%d %H:%M:%S") # FILTER BY DATE RANGE date_filter<-as.POSIXct("2012-01-01 01:01:01") df1<-subset(df1, df1$date>date_filter) df2<-subset(df2, df2$date>date_filter) df3<-subset(df3, df3$date>date_filter) # AGGREGATE OVER A UNIQUE ID df1<-df1[,(sum(var)), by=list(id)] df2<-df2[,(sum(var)), by=list(id)] df2<-df2[,(sum(var)), by=list(id)] # FINALLY, MERGE TOGETHER df <-merge(df1,df2, by="id",all=TRUE)

You get the idea - only I need to do this for 25 data frames, not 3. I have a suspicion that I can perform repetition functions of R by creating a vector ( df_nums<-c(1:25) ) and then using the function for a loop over all my data frames, but I don't know how to do this.

Please, help! Thanks!

Edit: Thanks to Arun, I agree with this for my actual code:

 out<- lapply(1:length(files), function(idx) { df <- as.data.table(read.csv(files[idx], header = TRUE)) df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S") date_filter <- as.POSIXct("2012-11-13 01:01:01") df <- subset(df, df$date > date_filter) df <- df[, .N, by = list(id)] }) out<-data.table(out) out.merge <- Reduce(function(...) merge(..., by="id", all=T), out)

Edit 2: After running the following syntax, I look like data.tables , nested in out. For instance,

 > head(out) out 1: <data.table> 2: <data.table> 3: <data.table> 4: <data.table> 5: <data.table> 6: <data.table>

How to access these data.tables to make sure everything is working correctly?

+4

r

roody Jan 21 '13 at 15:43

source share

1 answer

Arun · Accepted Answer · 2013-01-21T15:47:01+0000

You can use list.files to get all CSV files from a directory and use lapply for recursion as follows:

 # Thanks Matthew for correcting the pattern string files <- list.files("path_to_files", full.names = TRUE, pattern="\\.csv$") out <- lapply(1:length(files), function(idx) { df <- as.data.table(read.csv(files[idx], header = TRUE)) df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S") date_filter <- as.POSIXct("2012-01-01 01:01:01") df <- subset(df, df$date > date_filter) df <-df[, (sum(var)), by = list(id)] })

You can use do.call(rbind, out) or do.call(cbind, out) to bind all results in rows or columns.

Edit: After @roody's question about the outer join. Something like that?

 out.merge <- Reduce(function(...) merge(..., by="id", all=T), out)

Repeating functions on sequentially labeled data frames

More articles: