Matching and counting character strings in R

I have an array of character strings made up of all possible 4 letter combinations J, K, Q, Z. Entries in the array consists of at least two letters and no more than 4. For example: data<-c("QK", "KQ", "JKQZ", "KJZ").

I would like to count the number of times each record in an array occurs, but without distinguishing between strings consisting of the same letters, but in a different order. I know that table(data)he does not do this, because he does not think about QKand KQas the same and returns

data
JKQZ  KJZ   KQ   QK 
   1    1    1    1 

I looked at pmatchor charmatch, but that doesn't seem to do what I want.

EDIT: I must clarify that there are no entries in which the letter is repeated. In fact, I cannot have a record ZZorKZK

+4
source share
2 answers

First I have to make a table for observation (given as a factor for getting zero cells), then a hash of each table and calculate that:

require(magrittr)
require(digest)
data<-c("QK", "KQ", "JKQZ", "KJZ")
tbl <- strsplit(data, "") %>% lapply(factor,levels=c("K","Q", "J", "Z")) %>%
lapply(table) %>%  do.call(what=rbind)
tbl

which gives the following:

     K Q J Z
[1,] 1 1 0 0
[2,] 1 1 0 0
[3,] 1 1 1 1
[4,] 1 0 1 1

Then hash and count:

h <- apply(tbl, 1, digest)
tbl <- cbind(tbl, count=as.vector(table(h)[h]))
tbl <- tbl[!duplicated(h), ]

Here is the result:

     K Q J Z count
[1,] 1 1 0 0     2
[2,] 1 1 1 1     1
[3,] 1 0 1 1     1
+1
source

Here's a longer variation on David's comment / answer:

vals    <- sort(unique(unlist(strsplit(data,''))))
combos  <- unlist(sapply(seq_along(vals),function(i)combn(vals,i,paste0,collapse="")))
newdata <- factor(sapply(strsplit(data,""),function(x)paste0(sort(x),collapse="")),
             levels=combos)
tab <- table(newdata)
# newdata
#    J    K    Q    Z   JK   JQ   JZ   KQ   KZ   QZ  JKQ  JKZ  JQZ  KQZ JKQZ 
#    0    0    0    0    0    0    0    2    0    0    0    1    0    0    1 
tab[tab>0] # alternately
#   KQ  JKZ JKQZ 
#    2    1    1 
+2
source

All Articles