Combining variables in a dataframe

Question

Combining variables in a dataframe

I have a data frame like this:

KEY C1  C2  C3  C4
A   0   0   1   0
B   0   0   1   0
C   0   1   1   0
D   0   0   1   0
E   1   0   1   0
F   1   0   0   0
G   0   1   0   0
H   0   0   1   0
I   0   1   1   0
J   1   0   0   1

and would like to build such a matrix with two values of "1" located in two variables.

I would not want to count lines where there are more than two values of the type:

KEY C1  C2  C3  C4
L   1   0   1   1

or less than two:

M   1   0   0  0

The output should be a frequency table.

   C1 C2 C3 C4
C1 3  0  1  1
C2 0  3  2  0
C3 1  2  7  0
C4 1  0  0  1

There may be more variables before C20 and, of course, more lines. Thanks for helping me!

+1

r

Przemo Jun 12 '15 at 16:07

source share

2 answers

it looks like you want a subset first. Try the following:

df  <-  read.csv("file1.csv")

df2 <-  subset(df, rowSums(df[,-1]) == 2)

m1 <- t(df2[-1])

colnames(m1) <- df1[,1]
tcrossprod(m1)

This gives

#     C1 C2 C3 C4
# C1  2  0  1  1
# C2  0  2  2  0
# C3  1  2  3  0
# C4  1  0  0  1

+2

Aaron katch Jun 12 '15 at 16:22

source share

akrun · Accepted Answer · 2015-06-12T16:09:48+0000

Try

 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  3  0  1  1
 #C2  0  3  2  0
 #C3  1  2  7  0
 #C4  1  0  0  1

As for the part subset, I do not get the expected result,

 df1 <- df1[rowSums(df1[-1])==2,]
 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  2  0  1  1
 #C2  0  2  2  0
 #C3  1  2  3  0
 #C4  1  0  0  1

data

df1 <- structure(list(KEY = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J"), C1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L), C2 = c(0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), C3 = c(1L, 1L, 1L, 1L, 1L, 
0L, 0L, 1L, 1L, 0L), C4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L)), .Names = c("KEY", "C1", "C2", "C3", "C4"), class = "data.frame", 
row.names = c(NA, -10L))

Combining variables in a dataframe

data

More articles: