Creating a contingency table using multiple columns in a data frame in R

I have a data frame that looks like this:

structure(list(ab = c(0, 1, 1, 1, 1, 0, 0, 0, 1, 1), bc = c(1, 1, 1, 1, 0, 0, 0, 1, 0, 1), de = c(0, 0, 1, 1, 1, 0, 1, 1, 0, 1), cl = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 2)), .Names = c("ab", "bc", "de", "cl"), row.names = c(NA, -10L), class = "data.frame") 

The cl column indicates cluster association and the variables ab, bc and carry binary answers, where 1 indicates yes and 0 indicates no.

I am trying to create a cluster with crosstab tables along with all the other columns in the data frame: ab, bc and de, where the clusters become columns of variables. The desired result looks like this:

  1 2 3 ab 1 3 2 bc 2 3 1 de 2 3 1 

I tried the following code:

 with(newdf, tapply(newdf[,c(3)], cl, sum)) 

This gives me cross tabbing values ​​only one column at a time. In my data frame there are 1600+ columns with 1 column of the cluster. Can anyone help?

+6
source share
4 answers

Your data is in a semi-long semi-wide format, and you want to receive it in a completely wide format. This is easiest if we first hide it in a completely long format:

 library(reshape2) df_long = melt(df, id.vars = "cl") head(df_long) # cl variable value # 1 1 ab 0 # 2 2 ab 1 # 3 3 ab 1 # 4 1 ab 1 # 5 2 ab 1 # 6 3 ab 0 

Then we can turn it into a wide format, using sum as an aggregate function:

 dcast(df_long, variable ~ cl, fun.aggregate = sum) # variable 1 2 3 # 1 ab 1 3 2 # 2 bc 2 3 1 # 3 de 2 3 1 
+6
source

One way to use dplyr :

 library(dplyr) df %>% #group by the varialbe cl group_by(cl) %>% #sum every column summarize_each(funs(sum)) %>% #select the three needed columns select(ab, bc, de) %>% #transpose the df t 

Output:

  [,1] [,2] [,3] ab 1 3 2 bc 2 3 1 de 2 3 1 
+7
source

In base R:

 t(sapply(data[,1:3],function(x) tapply(x,data[,4],sum))) # 1 2 3 #ab 1 3 2 #bc 2 3 1 #de 2 3 1 
+4
source

You can also combine tidyr:gather or reshape2::melt and xtabs to have a match table

 library(tidyr) xtabs(value ~ key + cl, data = gather(df, key, value, -cl)) ## cl ## key 1 2 3 ## ab 1 3 2 ## bc 2 3 1 ## de 2 3 1 

If you prefer to use the handset

 df %>% gather(key, value, -cl) %>% xtabs(value ~ key + cl, data = .) 
+2
source

All Articles