Calculate column sums for each combination of two grouping variables

Question

Calculate column sums for each combination of two grouping variables

I have a dataset that looks something like this:

Type Age count1 count2 Year Pop1 Pop2 TypeDescrip A 35 1 1 1990 30000 50000 alpha A 35 3 1 1990 30000 50000 alpha A 45 2 3 1990 20000 70000 alpha B 45 2 1 1990 20000 70000 beta B 45 4 5 1990 20000 70000 beta

I want to add a number of rows that match type and age columns. Therefore, ideally, I get a dataset that looks like this:

  Type Age count1 count2 Year Pop1 Pop2 TypeDescrip A 35 4 2 1990 30000 50000 alpha A 45 2 3 1990 20000 70000 alpha B 45 6 6 1990 20000 70000 beta

I tried using the duplicated() nested instructions, such as below:

 typedup = duplicated(df$Type) bothdup = duplicated(df[(typedup == TRUE),]$Age)

but this returns indexes for which age or type are duplicated, not necessarily when one row has duplicates of both.

I also tried using:

 tapply(c(df$count1, df$count2), c(df$Age, df$Type), sum)

but it’s hard to work with this output. I want to have data.frame when done.

I do not want to use for-loop because my dataset is quite large.

+6

r aggregate

heo Jul 02 '15 at 17:25

source share

2 answers

@hannah you can also use sql using sqldf package

 sqldf("select Type,Age, sum(count1) as sum_count1, sum(count2) as sum_count2 from df group by Type,Age ")

+1

Ajay ohri Jul 02 '15 at 17:40

source share

akrun · Accepted Answer · 2015-07-02T17:26:19+0000

Try

 library(dplyr) df1 %>% group_by(Type, Age) %>% summarise_each(funs(sum)) # Type Age count1 count2 #1 A 35 4 2 #2 A 45 2 3 #3 B 45 6 6

Or using base R

  aggregate(.~Type+Age, df1, FUN=sum) # Type Age count1 count2 #1 A 35 4 2 #2 A 45 2 3 #3 B 45 6 6

or

 library(data.table) setDT(df1)[, lapply(.SD, sum), .(Type, Age)] # Type Age count1 count2 #1: A 35 4 2 #2: A 45 2 3 #3: B 45 6 6

Update

Based on a new dataset

  df2 %>% group_by(Type, Age,Pop1, Pop2, TypeDescrip) %>% summarise_each(funs(sum), matches('^count')) # Type Age Pop1 Pop2 TypeDescrip count1 count2 #1 A 35 30000 50000 alpha 4 2 #2 A 45 20000 70000 beta 2 3 #3 B 45 20000 70000 beta 6 6

data

  df1 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L, 1L, 3L, 1L, 5L)), .Names = c("Type", "Age", "count1", "count2" ), class = "data.frame", row.names = c(NA, -5L)) df2 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L, 35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L, 1L, 3L, 1L, 5L), Year = c(1990L, 1990L, 1990L, 1990L, 1990L), Pop1 = c(30000L, 30000L, 20000L, 20000L, 20000L), Pop2 = c(50000L, 50000L, 70000L, 70000L, 70000L), TypeDescrip = c("alpha", "alpha", "beta", "beta", "beta")), .Names = c("Type", "Age", "count1", "count2", "Year", "Pop1", "Pop2", "TypeDescrip"), class = "data.frame", row.names = c(NA, -5L))

Calculate column sums for each combination of two grouping variables

Update

data

More articles: