With the following sample data, I would like to draw a stratified random sample (for example, 40%) of the identifier โIDโ from each level of the โCohortโ factor:
data<-structure(list(Cohort = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), ID = structure(1:20, .Label = c("a1 ", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "b10", "b11", "b12", "b13", "b14", "b15", "b16", "b17", "b18", "b19", "b20" ), class = "factor")), .Names = c("Cohort", "ID"), class = "data.frame", row.names = c(NA, -20L))
I know how to draw a random number of lines using the following:
library(dplyr) data %>% group_by(Cohort) %>% sample_n(size = 10)
But my actual data is longitudinal, so I have several cases of the same identifier in each cohort and several cohorts of different sizes, so you need to choose a fraction of unique identifiers. Any help would be appreciated.