Randomly group samples

Question

Randomly group samples

For data dfwith a column named group, how do you randomly select kgroups from it in dplyr? It should return all rows from groups k(if df$groupthere are no less kunique values), and each group from dfshould be returned equally.

+4

r dplyr

Big dogg May 10, '16 at 21:56

source share

2 answers

Although you would like to do this in dplyr, it makes no sense to me:

library(microbenchmark)
microbenchmark(dplyr= iris %>% filter(Species %in% sample(levels(Species),2)),
               base= iris[iris[["Species"]] %in% sample(levels(iris[["Species"]]), 2),])

Unit: microseconds
  expr     min      lq     mean  median       uq      max neval cld
 dplyr 660.287 710.655 753.6704 722.629 771.2860 1122.527   100   b
  base  83.629  95.032 110.0936 106.057 119.1715  199.949   100  a

Note [[ known faster than $, although both work

0

Alex w May 10 '16 at 23:01

source share

Mrflick · Accepted Answer · 2016-05-10T22:04:43+0000

Just use sample()to select a number of groups

iris %>% filter(Species %in% sample(levels(Species),2))

Randomly group samples

More articles: