Randomly group samples

For data dfwith a column named group, how do you randomly select kgroups from it in dplyr? It should return all rows from groups k(if df$groupthere are no less kunique values), and each group from dfshould be returned equally.

+4
source share
2 answers

Just use sample()to select a number of groups

iris %>% filter(Species %in% sample(levels(Species),2))
+12
source

Although you would like to do this in dplyr, it makes no sense to me:

library(microbenchmark)
microbenchmark(dplyr= iris %>% filter(Species %in% sample(levels(Species),2)),
               base= iris[iris[["Species"]] %in% sample(levels(iris[["Species"]]), 2),])

Unit: microseconds
  expr     min      lq     mean  median       uq      max neval cld
 dplyr 660.287 710.655 753.6704 722.629 771.2860 1122.527   100   b
  base  83.629  95.032 110.0936 106.057 119.1715  199.949   100  a 

Note [[ known faster than $, although both work

0
source

All Articles