Calculate the relative frequency for a specific group

I have a data.frame of categorical variables, which I divided into groups, and I got the counts for each group.

My original data nyD looks like: Source: local data frame [7 x 3] Groups: v1, v2, v3 v1 v2 v3 1 a plus yes 2 a plus yes 3 a minus no 4 b minus yes 5 bx yes 6 cx notk 7 cx notk I performed the following operations using dplyr: ny1 <- nyD %>% group_by(v1,v2,v3)%>% summarise(count=n()) %>% mutate(prop = count/sum(count)) My data "ny1" looks like: Source: local data frame [5 x 5] Groups: v1, v2 v1 v2 v3 count prop 1 a minus no 1 1 2 a plus yes 2 1 3 b minus yes 1 1 4 bx yes 1 1 5 cx notk 2 1 

I want to calculate the relative frequency with respect to the V1 groups in the prop variable. The variable prop must be the appropriate number divided by the "sum of samples for group V1". Group V1 has a total of 3 "a", 2 "b" and 1 "c". That is, ny1 $ prop [1] <- 1/3, ny1 $ prop [2] - 2/3 .... A mutant operation in which the use of count / sum (count) is incorrect. I need to indicate that the amount should be realized only for group V1. Is there a way to use dplyr to achieve this?

+2
source share
1 answer

You can do it all in one step (from your nyD source data and without creating ny1 ). This is because when you run mutate after summarise , dplyr will disable one aggregation level ( v2 ) by default (of course, my favorite feature in dplyr ) and will only aggregate on v1

 nyD %>% group_by(v1, v2) %>% summarise(count = n()) %>% mutate(prop = count/sum(count)) # Source: local data frame [5 x 4] # Groups: v1 # # v1 v2 count prop # 1 a minus 1 0.3333333 # 2 a plus 2 0.6666667 # 3 b minus 1 0.5000000 # 4 bx 1 0.5000000 # 5 cx 2 1.0000000 

Or a shorter version using count (thanks to @beginneR)

 df %>% count(v1, v2) %>% mutate(prop = n/sum(n)) 
+5
source

All Articles