Assignment of a certain number of values ​​reported by a probability distribution (in R)

Hi and thanks for the help!

I am trying to create a vector with a certain number of values ​​that are assigned according to a probability distribution. For example, I need a vector of length 31, containing 26 zeros and 5 units. (The total sum of the vector should always be 5). However, the location of these objects is important. And to determine which values ​​should be equal to one and which should be equal to zero, I have a probability vector (length 31) that looks like this:

probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01, 0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01) 

I can select the values ​​according to this distribution and get a vector of length 31 using rbinom, but I cannot select exactly five values.

 Inv=rbinom(length(probs),1,probs) Inv [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 

Any ideas?

Thanks again!

+7
source share
3 answers

How to simply use a weighted sample.int to select locations?

 Inv<-integer(31) Inv[sample.int(31,5,prob=probs)]<-1 Inv [1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 
+10
source

Chase gives a great answer and mentions the run-away while() iteration problem. One of the problems with running while() is that if you do this one test at a time and it takes a lot, say t, of samples to find one that matches the target number 1 s, you bear the overhead t calls the main function, rbinom() in this case.

However, there is a solution, because rbinom() , like all these (pseudo) random number generators in R, is vectorized, we can generate m tests at a time and check these m tests to meet the requirements of 5 1 s. If none are found, we repeatedly collect m samples until we find one that matches. This idea is implemented in the foo() function below. The chunkSize argument is m, the number of samples to draw at a time. I also took the opportunity to let the function find more than one conformal test; argument n determines the number of conformal probes returned.

 foo <- function(probs, target, n = 1, chunkSize = 100) { len <- length(probs) out <- matrix(ncol = len, nrow = 0) ## return object ## draw chunkSize trials trial <- matrix(rbinom(len * chunkSize, 1, probs), ncol = len, byrow = TRUE) rs <- rowSums(trial) ## How manys `1`s ok <- which(rs == 5L) ## which meet the `target` found <- length(ok) ## how many meet the target if(found > 0) ## if we found some, add them to out out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n,found)), , drop = FALSE]) ## if we haven't found enough, repeat the whole thing until we do while(found < n) { trial <- matrix(rbinom(len * chunkSize, 1, probs), ncol = len, byrow = TRUE) rs <- rowSums(trial) ok <- which(rs == 5L) New <- length(ok) if(New > 0) { found <- found + New out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n, New)), , drop = FALSE]) } } if(n == 1L) ## comment this, and out <- drop(out) ## this if you don't want dimension dropping out } 

It works as follows:

 > set.seed(1) > foo(probs, target = 5) [1] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 [31] 0 > foo(probs, target = 5, n = 2) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 0 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 1 [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [1,] 0 0 0 1 1 0 0 0 0 0 [2,] 0 1 0 0 1 0 0 0 0 0 [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [1,] 1 0 1 0 0 0 1 0 0 0 [2,] 1 0 1 0 0 0 0 0 0 0 

Please note that I delete the empty dimension in the case when n == 1 . Comment on the last if snippet if you don't want this feature.

You need to balance the size of chunkSize with the computational burden of checking that many trials at a time. If the requirement (here 5 1 s) is very unlikely, then increase chunkSize so that you have fewer calls to rbinom() . If this requirement is most likely, a small number of test drawings and large chunkSize at a time, if you want only one or two, how you should evaluate each test draw.

+6
source

I think you want to convert from a binomial distribution with a given set of probabilities until you hit target value 5, right? If so, then I think it does what you want. The while can be used to repeat until the condition is met. If you feed very unrealistic probabilistic and target values, I think this can turn into a run-out function, so think about yourself :)

 FOO <- function(probs, target) { out <- rbinom(length(probs), 1, probs) while (sum(out) != target) { out <- rbinom(length(probs), 1, probs) } return(out) } 

FOO (probs, target = 5)

 > FOO(probs, target = 5) [1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 
+5
source

All Articles