Assignment of a certain number of values reported by a probability distribution (in R)

Question

Assignment of a certain number of values reported by a probability distribution (in R)

Hi and thanks for the help!

I am trying to create a vector with a certain number of values that are assigned according to a probability distribution. For example, I need a vector of length 31, containing 26 zeros and 5 units. (The total sum of the vector should always be 5). However, the location of these objects is important. And to determine which values should be equal to one and which should be equal to zero, I have a probability vector (length 31) that looks like this:

probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01, 0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)

I can select the values according to this distribution and get a vector of length 31 using rbinom, but I cannot select exactly five values.

 Inv=rbinom(length(probs),1,probs) Inv [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0

Any ideas?

Thanks again!

+7

vector r probability

Laura Aug 4 '11 at 3:50

source share

3 answers

Chase gives a great answer and mentions the run-away while() iteration problem. One of the problems with running while() is that if you do this one test at a time and it takes a lot, say t, of samples to find one that matches the target number 1 s, you bear the overhead t calls the main function, rbinom() in this case.

However, there is a solution, because rbinom() , like all these (pseudo) random number generators in R, is vectorized, we can generate m tests at a time and check these m tests to meet the requirements of 5 1 s. If none are found, we repeatedly collect m samples until we find one that matches. This idea is implemented in the foo() function below. The chunkSize argument is m, the number of samples to draw at a time. I also took the opportunity to let the function find more than one conformal test; argument n determines the number of conformal probes returned.

 foo <- function(probs, target, n = 1, chunkSize = 100) { len <- length(probs) out <- matrix(ncol = len, nrow = 0) ## return object ## draw chunkSize trials trial <- matrix(rbinom(len * chunkSize, 1, probs), ncol = len, byrow = TRUE) rs <- rowSums(trial) ## How manys `1`s ok <- which(rs == 5L) ## which meet the `target` found <- length(ok) ## how many meet the target if(found > 0) ## if we found some, add them to out out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n,found)), , drop = FALSE]) ## if we haven't found enough, repeat the whole thing until we do while(found < n) { trial <- matrix(rbinom(len * chunkSize, 1, probs), ncol = len, byrow = TRUE) rs <- rowSums(trial) ok <- which(rs == 5L) New <- length(ok) if(New > 0) { found <- found + New out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n, New)), , drop = FALSE]) } } if(n == 1L) ## comment this, and out <- drop(out) ## this if you don't want dimension dropping out }

It works as follows:

 > set.seed(1) > foo(probs, target = 5) [1] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 [31] 0 > foo(probs, target = 5, n = 2) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 0 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 1 [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [1,] 0 0 0 1 1 0 0 0 0 0 [2,] 0 1 0 0 1 0 0 0 0 0 [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [1,] 1 0 1 0 0 0 1 0 0 0 [2,] 1 0 1 0 0 0 0 0 0 0

Please note that I delete the empty dimension in the case when n == 1 . Comment on the last if snippet if you don't want this feature.

You need to balance the size of chunkSize with the computational burden of checking that many trials at a time. If the requirement (here 5 1 s) is very unlikely, then increase chunkSize so that you have fewer calls to rbinom() . If this requirement is most likely, a small number of test drawings and large chunkSize at a time, if you want only one or two, how you should evaluate each test draw.

+6

Gavin simpson Aug 4 '11 at 8:40

source share

I think you want to convert from a binomial distribution with a given set of probabilities until you hit target value 5, right? If so, then I think it does what you want. The while can be used to repeat until the condition is met. If you feed very unrealistic probabilistic and target values, I think this can turn into a run-out function, so think about yourself :)

 FOO <- function(probs, target) { out <- rbinom(length(probs), 1, probs) while (sum(out) != target) { out <- rbinom(length(probs), 1, probs) } return(out) }

FOO (probs, target = 5)

 > FOO(probs, target = 5) [1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0

+5

Chase Aug 4 '11 at 4:11

source share

James · Accepted Answer · 2011-08-04T11:30:39+0000

How to simply use a weighted sample.int to select locations?

 Inv<-integer(31) Inv[sample.int(31,5,prob=probs)]<-1 Inv [1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

Assignment of a certain number of values ​​reported by a probability distribution (in R)

More articles:

Assignment of a certain number of values reported by a probability distribution (in R)