Correspondence of vector values for records in a data frame in R

Question

Correspondence of vector values for records in a data frame in R

I have a vector of r values as follows:

  r<-c(1,3,4,6,7)

and a df data frame with 20 records and two columns:

  id<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20) freq<-c(1,3,2,4,5,6,6,7,8,3,3,1,6,9,9,1,1,4,3,7,7) df<-data.frame(id,freq)

Using the vector r , I need to extract a sample of records (in the form of a new data frame) from df so that the freq values of the records are equal to the values. I have a vector r . Needless to say, if he finds several records with the same freq values, he should randomly select one of them. For example, one possible result could be:

  id frequency 12 1 10 3 4 4 7 6 8 7

I would appreciate it if someone could help me with this.

+5

r sampling

Ali civil May 01 '15 at 14:19

source share

3 answers

You can use filter and sample_n from "dplyr":

 library(dplyr) set.seed(1) df %>% filter(freq %in% r) %>% group_by(freq) %>% sample_n(1) # Source: local data frame [5 x 2] # Groups: freq # # id freq # 1 12 1 # 2 10 3 # 3 17 4 # 4 13 6 # 5 8 7

+4

A5C1D2H2I1M1N2O1R2T1 May 01, '15 at 14:25

source share

Have you tried using the match() or %in% function? This may not be a quick / clean solution, but it uses only the base R functions:

 rUnique <- unique(r) df2 <- df[df$freq %in% rUnique,] x <- data.frame(id = NA, freq = rUnique) for (i in 1:length(rUnique)) { x[i,1] <- sample(df2[df2[, 2] == rUnique[i], 1], 1) } print(x)

+1

hsl May 01, '15 at 14:39

source share

akrun · Accepted Answer · 2015-05-01T14:33:47+0000

You can try data.table

 library(data.table) setDT(df)[freq %in% r,sample(id,1L) , freq]

Or using base R

 aggregate(id~freq, df, subset=freq %in% r, FUN= sample, 1L)

Update

If you have a vector "r" with duplicate values and you want to sample a data set ("df") based on the length of the unique elements in "r"

  r <-c(1,3,3,4,6,7) res <- do.call(rbind,lapply(split(r, r), function(x) { x1 <- df[df$freq %in% x,] x1[sample(1:nrow(x1),length(x), replace=FALSE),]})) row.names(res) <- NULL

Correspondence of vector values ​​for records in a data frame in R

Update

More articles:

Correspondence of vector values for records in a data frame in R