If iterable is given, select each element independently with probability p

I need a function to do the following in Python.

I need to iterate over the input list and select each element with probability p (which is not enough).

Here is a naive implementation to clearly understand what I'm doing.

def f(inputList,p):
    for element in inputList:
        if random.random()<p:
            yield element

Each random number generation is expensive. We can do fewer random number generations by calculating initially how long it will take until the first success, and then jump to this record. I have a method, but I wonder if there is something already existing or a better way to code it. Basically, I just need this for lists, but I would like for something to work on a general iterable.

def jump_calculation(p):
    if p == 1:
        return 0
    r = random.random()
    k = int(scipy.log(1-r)/scipy.log(1-p))
    return k
def binomial_choice(L,p):
    jump = jump_calculation(p)
    for element in L:
        jump -= 1
        if jump<0:
            yield element
            jump = jump_calculation(p)

Am I reinventing the wheel? If not, are obvious improvements making code easier to read?

+4
2

. , numpy.random.geometric:

import itertools
import numpy

def binomial_choice(L, p):
    iterator = iter(L)
    while True:
        to_skip = numpy.random.geometric(p) - 1
        yield next(itertools.islice(iterator, to_skip, None))

, .

Python 3.5+ - PEP 479:

def binomial_choice(L, p):
    iterator = iter(L)
    try:
        while True:
            to_skip = numpy.random.geometric(p) - 1
            yield next(itertools.islice(iterator, to_skip, None))
    except StopIteration:
        return

:

In [1]: list(binomial_choice(range(100), 0.05))
Out[1]: [9, 15, 31, 53, 92, 93]

In [2]: list(binomial_choice(range(5), 1))
Out[2]: [0, 1, 2, 3, 4]

:

In [5]: sum(len(list(binomial_choice(range(100), 0.05))) for i in range(100000)) / 100000
Out[5]: 4.99883

, :

In [14]: timeit list(binomial_choice_geometric(range(1000), 0.01))
10000 loops, best of 3: 24.4 µs per loop

In [11]: timeit list(binomial_choice_geometric_3_5(range(1000), 0.01))
10000 loops, best of 3: 42.7 µs per loop

In [12]: timeit list(binomial_choice_jump_calculation(range(1000), 0.01))
1000 loops, best of 3: 596 µs per loop

In [13]: timeit list(binomial_choice_foreach_random(range(1000), 0.01))
1000 loops, best of 3: 203 µs per loop

random.sample ( numpy.random.binomial ), len :

In [19]: timeit list(binomial_choice_random_sample(range(1000), 0.01))
10000 loops, best of 3: 19.8 µs per loop
+4

, , , .. random.sample() . , , . , , .

from random import sample
a = range(100)
probability = 0.5
max_sz = int(len(a) * probability)
sz = randint(0, max_sz)
print sample(a, sz)
# [34, 81, 58, 52, 9, 86, 57, 29, 3, 99]

P.S. , , . . , .

+1

All Articles