Random chain from itertools
I am copying an example from python docs .
def powerset(iterable): "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" s = list(iterable) return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
How can we randomize the order of the values we get while the powerset
result remains lazily evaluated?
EDIT: The reason I want it is because I would like to calculate the sum of the derived sets and stop as soon as I find two sets having the same sum. If I am not mistaken, the problem is NP-complete .
itertools.combinations()
gives us the results in the given order from input. Given this, we can shuffle our input list to create a random order of elements (it is obvious that there will be much less possible orders for the result).
def random_powerset(iterable): s = list(iterable) lengths = list(range(len(s)+1)) shuffle(lengths) return chain.from_iterable(combinations(s, r) for r in lengths if not shuffle(s))
(This is a bit ugly hack - we know that shuffle(s)
will always return False
, so we can add it as a condition to ensure it runs for every combinations()
call.)
We will pre-generate a list of lengths so that we can also shuffle this.
This is not entirely random (there will still be order - all elements of length n will cluster together, for example, and these elements will be in order, depending on the random order of input), but there will be a lot of randomness if this is enough for you.
Output Example:
>>> list(random_powerset(range(3))) [(), (2,), (0,), (1,), (2, 1), (2, 0), (1, 0), (1, 2, 0)] >>> list(random_powerset(range(3))) [(), (0, 1), (0, 2), (1, 2), (0, 1, 2), (2,), (0,), (1,)] >>> list(random_powerset(range(3))) [(0, 1, 2), (2,), (1,), (0,), (0, 2), (0, 1), (2, 1), ()] >>> list(random_powerset(range(3))) [(1, 2, 0), (0,), (2,), (1,), (), (0, 1), (0, 2), (1, 2)] >>> list(random_powerset(range(3))) [(), (2, 1), (2, 0), (1, 0), (0,), (2,), (1,), (2, 1, 0)] >>> list(random_powerset(range(3))) [(1, 0), (1, 2), (0, 2), (0, 2, 1), (), (1,), (0,), (2,)]
I think the best you can do without making it lazy.
Here's another idea: Store the combination generators and make them randomly until you consume everything. It also randomizes the order of the given sizes.
Change I assume that you do not care about the order of the elements in the same set, since you sum them up. If you do, you can put random.shuffle(next_value)
before exiting.
import itertools import random def random_powerset(l): combs = [itertools.combinations(l,i) for i in range(len(l)+1)] while combs: comb_index = random.choice(range(len(combs))) try: next_value = next(combs[comb_index]) yield next_value except StopIteration: combs.pop(comb_index)
Conclusion:
In : list(random_powerset(range(3))) Out: [(0, 1), (0, 2), (0, 1, 2), (1, 2), (), (0,), (1,), (2,)] In : list(random_powerset(range(3))) Out: [(0, 1, 2), (0,), (), (0, 1), (1,), (0, 2), (1, 2), (2,)] In : list(random_powerset(range(3))) Out: [(0, 1), (0, 1, 2), (0, 2), (), (0,), (1,), (1, 2), (2,)] In : list(random_powerset(range(3))) Out: [(), (0,), (0, 1), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)] In : list(random_powerset(range(3))) Out: [(), (0, 1), (0,), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)] In : list(random_powerset(range(3))) Out: [(0, 1), (0,), (0, 2), (1, 2), (), (1,), (2,), (0, 1, 2)] In : list(random_powerset(range(3))) Out: [(), (0, 1, 2), (0,), (1,), (2,), (0, 1), (0, 2), (1, 2)]
You can improve the Lattyware solution a little if you go beyond itertools.chain
:
def chain_random(iterables): iterables = list(iterables) icount = len(iterables) if icount == 0: return while icount > 1: shuffle(iterables) try: yield iterables[-1].next() except StopIteration: iterables.pop() icount -= 1 for element in iterables[0]: yield element def random_powerset(iterable): s = list(iterable) lengths = list(range(len(s)+1)) shuffle(lengths) return chain_random(combinations(s, r) for r in lengths if not shuffle(s))
Output Example:
>>> list(random_powerset(range(3))) [(), (2, 1, 0), (1, 0), (1, 2), (2,), (0, 2), (1,), (0,)] >>> list(random_powerset(range(3))) [(1, 0), (1, 2), (0, 2, 1), (2,), (), (0, 2), (0,), (1,)] >>> list(random_powerset(range(3))) [(0, 1), (), (0, 2), (0,), (1, 2), (2, 0, 1), (1,), (2,)] >>> list(random_powerset(range(3))) [(), (1, 2), (2,), (1, 0), (0,), (2, 0), (1,), (1, 0, 2)] >>> list(random_powerset(range(3))) [(0, 1), (), (2,), (0, 2), (1, 2), (1,), (1, 2, 0), (0,)] >>> list(random_powerset(range(3))) [(0, 2, 1), (0,), (), (2, 0), (1,), (2, 1), (2,), (0, 1)]
itertools
written in C, so chain_random
will be slower than itertools.chain
. But you get more randomization this way.
This is a lazy and random solution:
import random def powerset(seq): n = 2**len(seq) used = set([]) while len(used) < n: choice = random.randint(0, n - 1) if not (choice in used): used.add(choice) binary = bin(choice)[2:].zfill(len(seq)) yield [i[1] for i in zip(binary, seq) if i[0] == '1'] #or following line if > python 2.7: #yield itertools.compress(seq, binary) print list(powerset([1,2,3])) print list(powerset([1,2,3])) #output: [[3], [1], [2, 3], [], [1, 2], [2], [1, 3], [1, 2, 3]] [[2, 3], [1, 3], [1], [1, 2, 3], [1, 2], [2], [3], []]
If you consider the combination [1, 2, 3]
in binary format:
n 123 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111
Each combination can be marked with a unique binary identifier. And there are always 2**len(seq)
combinations .... So:
- Randomly select an integer,
0
and2**len(seq) - 1
. - check that we have not used it before (if we have, draw again).
- convert it to binary.
- zip using
seq
. - if zipped binary digits are
'0'
, we exclude it from the output.
This is lazy and will work for large seq
.
A little caution:
A problem may occur, but that probably doesn't matter to you. Toward the end of the sequence, you may encounter difficulty re-redrawing (which may take some time). Since the probability of drawing an already drawn number is number of successful draws / 2**len(seq)
, therefore, in this figure, g
expected number of draws for finding an unused new number:
n / (n - g) #where n = 2**len(seq)
This is normal if: n
is small or for large n
: g << n
(both or both of these situations are very likely, so neither of them should be a big problem). In fact, with large n
you can do without used
and checking for repetitions in general, since the expected number of iterations before the first repetition approaches n**0.5
.