Aggregate data in one column based on values ​​in another column

I know there is an easy way to do this ... but I cannot figure it out.

I have a dataframe in my R script that looks something like this:

ABC 1.2 4 8 2.3 4 9 2.3 6 0 1.2 3 3 3.4 2 1 1.2 5 1 

Note that A, B, and C are column names. And I'm trying to get variables like this:

 sum1 <- [the sum of all B values such that A is 1.2] num1 <- [the number of times A is 1.2] 

Any easy way to do this? I basically want to get a data frame that looks like this:

  A num totalB 1.2 3 12 etc etc etc 

Where "num" is the number of times a certain value of A has appeared, and "totalB" is the sum of the values ​​of B, given the value of A.

+8
r aggregate dataframe
source share
4 answers

I would use aggregate to get two aggregates, and then merge them into one data frame:

 > df ABC 1 1.2 4 8 2 2.3 4 9 3 2.3 6 0 4 1.2 3 3 5 3.4 2 1 6 1.2 5 1 > num <- aggregate(B~A,df,length) > names(num)[2] <- 'num' > totalB <- aggregate(B~A,df,sum) > names(totalB)[2] <- 'totalB' > merge(num,totalB) A num totalB 1 1.2 3 12 2 2.3 2 10 3 3.4 1 2 
+13
source share

Here is a solution using plyr package

 plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B)) 
+4
source share

Here is a solution using data.table to save memory and time

 library(data.table) DT <- as.data.table(df) DT[, list(totalB = sum(B), num = .N), by = A] 

A subset of the only lines where C==1 (as per the comment on @aix's answer)

 DT[C==1, list(totalB = sum(B), num = .N), by = A] 
+4
source share

In dplyr :

 library(tidyverse) A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2) B <- c(4, 4, 6, 3, 2, 5) C <- c(8, 9, 0, 3, 1, 1) df <- data_frame(A, B, C) df %>% group_by(A) %>% summarise(num = n(), totalB = sum(B)) 
+1
source share

All Articles