We can use rleid from devel version data.table ie v1.9.5 +. Installation instructions for the devel: here version.
We will convert 'data.frame' to 'data.table'. Create another grouping column "ind" from the "key" column. Grouped by "id" and "ind", we get the sum of "value" and get the first element "key". We can set 'ind' to NULL because it is not needed in the expected output.
library(data.table) setDT(df1)[,list(value = sum(value), key=key[1L]), by = .(ind=rleid(key), id)][, ind:=NULL][] # id value key #1: 1 3000 A #2: 1 2001 B #3: 1 4006 A #4: 2 2002 B #5: 2 2002 C
Or, as @Frank suggested, we can use mutliple columns in rleid , use it as a grouping variable, get the first element of other variables and sum from the value, set the unwanted column to NULL or use the standard subset of data.table using with = FALSE .
setDT(df1)[, list(id=id[1L], key=key[1L], value=sum(value)) , by = .(r=rleid(id, key))][, r:= NULL][] # id key value #1: 1 A 3000 #2: 1 B 2001 #3: 1 A 4006 #4: 2 B 2002 #5: 2 C 2002
Or we can use dplyr . We create the grouping variable 'ind' by comparing adjacent elements of 'key' and get the sum from the value 'and first element' key 'with summarise .
library(dplyr) df1 %>% group_by(ind= cumsum(key!=lag(key, default=TRUE)), id) %>% summarise(value=sum(value), key=first(key)) %>% ungroup() %>% select(-ind) # id value key #1 1 3000 A #2 1 2001 B #3 1 4006 A #4 2 2002 B #5 2 2002 C
NOTE. In dplyr and data.table we can also put the key column as a grouping variable and delete key=key[1L] or key=first(key)) .
Or we transform data set by creating the "ind" column and use aggregate from base R to get the expected output
df1 <- transform(df1, ind = cumsum(c(TRUE,head(key,-1)!=tail(key,-1)))) aggregate(value~., df1, FUN=sum)[-3]
data
df1 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), key = c("A", "A", "B", "A", "A", "B", "C"), value = c(1000L, 2000L, 2001L, 2002L, 2004L, 2002L, 2002L)), .Names = c("id", "key", "value" ), class = "data.frame", row.names = c(NA, -7L))