Create a table with all pairs of values ​​from one column in R, counting unique values

I have data that shows which customers have purchased certain items. They can purchase goods several times. I need a table that shows all the possible pairing combinations of elements together with a unique number of customers who purchased this combination (the diagonal of the table will be just a unique number of people buying each item).

Here is an example:

item <- c("h","h","h","j","j") customer <- c("a","a","b","b","b") test.data <- data.frame(item,customer) 

Here is test.data:

 item customer ha ha hb jb jb 

The required result is a table with elements as row and column names, while the number of unique customers buys a pair inside the table. Thus, 2 customers purchased item h, 1 purchased both items h and j and 1 purchased item j.

 item hj h 2 1 j 1 1 

I tried using the table function, melt / cast , etc., but nothing counts the values ​​I need in the table. My first step is to use unique() to get rid of duplicate lines.

+6
source share
2 answers

Using the data.table and gtools package, we can recreate all possible permutations by the client:

 library(data.table) library(gtools) item <- c("h","h","h","j","j") customer <- c("a","a","b","b","b") test.data <- data.table(item,customer) DT <- unique(test.data) #The unique is used as multiple purchases do not count twice tuples <- function(x){ return(data.frame(permutations(length(x), 2, x, repeats.allowed = T, set = F), stringsAsFactors = F)) } DO <- DT[, tuples(item), by = customer] 

This gives:

  customer X1 X2 1: ahh 2: bhh 3: bhj 4: bjh 5: bjj 

What is the list of all unique pairs of elements that the client has. According to your example, we handle hxj differently than jx h. Now we can get the frequency of each pair using the table function:

 table(DO$X1,DO$X2) jh j 1 1 h 1 2 
+5
source

Here's the basic R solution:

 n_intersect <- Vectorize( function(x,y) length(intersect(x,y)) ) cs_by_item <- with(test.data, tapply(customer, item, unique)) outer(cs_by_item , cs_by_item , n_intersect) # hj # h 2 1 # j 1 1 
+5
source

All Articles