Rearrange datasets with duplicates in R

I use R to create permutations of a vector that replicates in it.

When creating permutations, I use numbers to represent groups. Here is what I can do for the little ones:

unlist(unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")))

Which returns a vector of 2520 permutations (8! / 2 ^ 4)

The problem is that I am trying to scroll this to 11 so that I can get every unique permutation from 16. 11. To get each combination, I do:

combs = unique(combn(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),11, paste0, collapse = ""))

and then will go through them and insert them together to get all unique 16, select 11 permutations.

Sounds like a huge amount?

This is not true. This is 525 525 lines, theoretically (16! / 5! 4! 4! 4! 4!) The problem is that this method should calculate all the lines 174356582400 (approximately 174 billion) in groups of 39 million (11!) And do unique work on them.

Is there a method that combines labels and factors in replication when looking for permutations?

Looking at other methods, I see that this works:

unique(permutations(16,11, c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), set=FALSE))

except that he spends too much time on this and he does the same thing as I do above, finding all the bad ones and then highlighting them

+6
source share
1 answer

What you are looking for are multisets permutations .

library(RcppAlgos)

multiPerm <- permuteGeneral(1:4, freqs = rep(2,4))

head(multiPerm)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    2    2    3    3    4    4
[2,]    1    1    2    2    3    4    3    4
[3,]    1    1    2    2    3    4    4    3
[4,]    1    1    2    2    4    3    3    4
[5,]    1    1    2    2    4    3    4    3
[6,]    1    1    2    2    4    4    3    3

Health Check:

library(combinat)
library(gtools)
OPTestOne <- unlist(unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")))
all.equal(sort(apply(multiPerm, 1, paste, collapse="")), sort(OPTestOne))
[1] TRUE

OPTestTwo <- unique(permutations(8,8,c(1,1,2,2,3,3,4,4), set=FALSE))
all.equal(OPTestTwo, multiPerm)
[1] TRUE  

Here are a few steps:

library(microbenchmark)
microbenchmark(OP_One = unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")),
               Algos = permuteGeneral(1:4, freqs = rep(2,4)),
               OP_Two = unique(permutations(8,8,c(1,1,2,2,3,3,4,4), set=FALSE)),
               times = 5, unit = "relative")
Unit: relative
  expr      min        lq      mean   median       uq       max neval
OP_One  8435.40  5570.476  5877.457 5562.094 5378.490  5409.687     5
 Algos     1.00     1.000     1.000    1.000    1.000     1.000     5
OP_Two 15335.55 10095.646 10700.802 9982.139 9539.425 10295.974     5

, m .

system.time(multiPermChoose11 <- permuteGeneral(1:4, m = 11, freqs = rep(4, 4)))
 user  system elapsed 
0.154   0.023   0.178

head(multiPermChoose11)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    1    1    1    1    2    2    2    2    3     3     3
[2,]    1    1    1    1    2    2    2    3    2     3     3
[3,]    1    1    1    1    2    2    2    3    3     2     3
[4,]    1    1    1    1    2    2    2    3    3     3     2
[5,]    1    1    1    1    2    2    3    2    2     3     3
[6,]    1    1    1    1    2    2    3    2    3     2     3

OP , (525,525) . , .

nrow(multiPermChoose11)
[1] 2310000

, :

length(unique(apply(multiPermChoose11, 1, paste, collapse ="")))
[1] 2310000

iterpc, , np_multiset

iterpc::np_multiset(rep(4,4), 11)
[1] 2310000

R : R: / / / by @RandyLai ( arrangements iterpc, .)

+4

All Articles