Using dplyr to create a pivot table of proportions with several categorical / factor variables

Question

Using dplyr to create a pivot table of proportions with several categorical / factor variables

I am trying to create one table that summarizes several categorical variables (using frequencies and proportions) of another variable. I would like to do this using the dplyr package.

These previous discussions partially change: Relative frequencies / proportions with dplyr and Calculate relative frequency for a specific group .

Using the mtcars dataset, it will look as if I just wanted to see the proportion of gear on am :

  mtcars %>% group_by(am, gear) %>% summarise (n = n()) %>% mutate(freq = n / sum(n)) # am gear n freq # 1 0 3 15 0.7894737 # 2 0 4 4 0.2105263 # 3 1 4 8 0.6153846 # 4 1 5 5 0.3846154

However, in fact, I want to look not only gears on am , but also carb on am and cyl on am , separately, in the same table. If I fix the code, follow these steps:

  mtcars %>% group_by (am, gear, carb, cyl) %>% summarise (n = n()) %>% mutate(freq = n / sum(n))

I get frequencies for each combination of am , gear , carb and cyl . This is not what I want. Is there a way to do this with dplyr?

EDIT

Also, it would be an added bonus if someone knew about the way to create the table I want, but with am categories as columns (as in the classic 2x2 table format). Here is an example of what I'm talking about. This is from one of my previous publications. I want to create this table in R so that I can output it directly to a text document using RMarkdown:

+6

r dplyr

Rnb Jan 4 '16 at 8:40

source share

2 answers

One way to solve this problem is to turn your data into a long (er) format. Then you can use the same code to calculate the desired results, with one additional group_by:

 library(reshape2) library(dplyr) m_mtcars <- melt(mtcars,measure.vars=c("gear","carb","cyl")) res <- m_mtcars %>% group_by(am, variable, value) %>% summarise (n = n()) %>% mutate(freq = n / sum(n))

Based on this, the desired result can be obtained using a larger change and formatting of the string.

 #make an 'export' variable res$export <- with(res, sprintf("%i (%.1f%%)", n, freq*100)) #reshape again output <- dcast(variable+value~am, value.var="export", data=res, fill="missing") #use drop=F to prevent silent missings #'silent missings' output$variable <- as.character(output$variable) #make 'empty lines' empties <- data.frame(variable=unique(output$variable), stringsAsFactors=F) empties[,colnames(output)[-1]] <- "" #bind them together output2 <- rbind(empties,output) output2 <- output2[order(output2$variable,output2$value),] #optional: 'remove' variable if value present output2$variable[output2$value!=""] <- ""

This leads to:

  variable value 0 1 2 carb 7 1 3 (15.8%) 4 (30.8%) 8 2 6 (31.6%) 4 (30.8%) 9 3 3 (15.8%) missing 10 4 7 (36.8%) 3 (23.1%) 11 6 missing 1 (7.7%) 12 8 missing 1 (7.7%) 3 cyl 13 4 3 (15.8%) 8 (61.5%) 14 6 4 (21.1%) 3 (23.1%) 15 8 12 (63.2%) 2 (15.4%) 1 gear 4 3 15 (78.9%) missing 5 4 4 (21.1%) 8 (61.5%) 6 5 missing 5 (38.5%)

+6

Heroka Jan 4 '16 at 8:56

source share

Gopala · Accepted Answer · 2016-01-04T13:34:02+0000

With the tidyr / dplyr combination, here is how you do it:

 library(tidyr) library(dplyr) mtcars %>% gather(variable, value, gear, carb, cyl) %>% group_by(am, variable, value) %>% summarise (n = n()) %>% mutate(freq = n / sum(n))

Using dplyr to create a pivot table of proportions with several categorical / factor variables

More articles: