R: collapse column values containing NA by sum when grouped by identifiers

Question

R: collapse column values containing NA by sum when grouped by identifiers

I have a data frame that I received from

ID <- c("A","A","A","A","B","B","B","B") Type <- c(45,45,46,46,45,45,46,46) Point_A <- c(10,NA,30,40,NA,80,NA,100) Point_B <- c(NA,32,43,NA,65,11,NA,53) df <- data.frame(ID,Type,Point_A,Point_B) ID Type Point_A Point_B 1 A 45 10 NA 2 A 45 NA 32 3 A 46 30 43 4 A 46 40 NA 5 B 45 NA 65 6 B 45 80 11 7 B 46 NA NA 8 B 46 100 53

As long as I found out from this post, I could collapse the data with an identifier and one column.

I am currently using sqldf to sum rows and groups by id and type. Although this does the job for me, it is very slow on a larger dataset.

  df1 <- sqldf("SELECT ID, Type, Sum(Point_A) as Point_A, Sum(Point_A) as Point_A FROM df GROUP BY ID, Type")

Please suggest using any other methods that will help solve this problem. I started to study the dplyr and plyr packages, and I find it very interesting, but not knowing how to apply it here.

Desired output

  ID Type Point_A Point_B 1 A 45 10 32 2 A 46 70 43 3 B 45 80 76 4 B 46 100 53

+5

r aggregate data.table dplyr plyr

Sharath May 14, '15 at 22:50

source share

2 answers

 library(data.table) DT <- as.data.table(df) DT[, lapply(.SD, sum, na.rm=TRUE), by=list(ID, Type)] ID Type Point_A Point_B 1: A 45 10 32 2: A 46 70 43 3: B 45 80 76 4: B 46 100 53

+9

Ricardo saporta May 14, '15 at 23:00

source share

Steven beaupré · Accepted Answer · 2015-05-14T23:12:07+0000

Using dplyr :

 df %>% group_by(ID, Type) %>% summarise_each(funs(sum(., na.rm = T)))

or

 df %>% group_by(ID, Type) %>% summarise(Point_A = sum(Point_A, na.rm = T), Point_B = sum(Point_B, na.rm = T))

or

 f <- function(x) sum(x, na.rm = T) df %>% group_by(ID, Type) %>% summarise(Point_A = f(Point_A), Point_B = f(Point_B))

What gives:

 #Source: local data frame [4 x 4] #Groups: ID # # ID Type Point_A Point_B #1 A 45 10 32 #2 A 46 70 43 #3 B 45 80 76 #4 B 46 100 53

R: collapse column values ​​containing NA by sum when grouped by identifiers

More articles:

R: collapse column values containing NA by sum when grouped by identifiers