Correspondence of vector values ​​for records in a data frame in R

I have a vector of r values ​​as follows:

  r<-c(1,3,4,6,7) 

and a df data frame with 20 records and two columns:

  id<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20) freq<-c(1,3,2,4,5,6,6,7,8,3,3,1,6,9,9,1,1,4,3,7,7) df<-data.frame(id,freq) 

Using the vector r , I need to extract a sample of records (in the form of a new data frame) from df so that the freq values ​​of the records are equal to the values. I have a vector r . Needless to say, if he finds several records with the same freq values, he should randomly select one of them. For example, one possible result could be:

  id frequency 12 1 10 3 4 4 7 6 8 7 

I would appreciate it if someone could help me with this.

+5
source share
3 answers

You can try data.table

 library(data.table) setDT(df)[freq %in% r,sample(id,1L) , freq] 

Or using base R

 aggregate(id~freq, df, subset=freq %in% r, FUN= sample, 1L) 

Update

If you have a vector "r" with duplicate values ​​and you want to sample a data set ("df") based on the length of the unique elements in "r"

  r <-c(1,3,3,4,6,7) res <- do.call(rbind,lapply(split(r, r), function(x) { x1 <- df[df$freq %in% x,] x1[sample(1:nrow(x1),length(x), replace=FALSE),]})) row.names(res) <- NULL 
+6
source

You can use filter and sample_n from "dplyr":

 library(dplyr) set.seed(1) df %>% filter(freq %in% r) %>% group_by(freq) %>% sample_n(1) # Source: local data frame [5 x 2] # Groups: freq # # id freq # 1 12 1 # 2 10 3 # 3 17 4 # 4 13 6 # 5 8 7 
+4
source

Have you tried using the match() or %in% function? This may not be a quick / clean solution, but it uses only the base R functions:

 rUnique <- unique(r) df2 <- df[df$freq %in% rUnique,] x <- data.frame(id = NA, freq = rUnique) for (i in 1:length(rUnique)) { x[i,1] <- sample(df2[df2[, 2] == rUnique[i], 1], 1) } print(x) 
+1
source

All Articles