Random chain from itertools
I am copying an example from python docs .
def powerset(iterable): "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" s = list(iterable) return chain.from_iterable(combinations(s, r) for r in range(len(s)+1)) How can we randomize the order of the values we get while the powerset result remains lazily evaluated?
EDIT: The reason I want it is because I would like to calculate the sum of the derived sets and stop as soon as I find two sets having the same sum. If I am not mistaken, the problem is NP-complete .
itertools.combinations() gives us the results in the given order from input. Given this, we can shuffle our input list to create a random order of elements (it is obvious that there will be much less possible orders for the result).
def random_powerset(iterable): s = list(iterable) lengths = list(range(len(s)+1)) shuffle(lengths) return chain.from_iterable(combinations(s, r) for r in lengths if not shuffle(s)) (This is a bit ugly hack - we know that shuffle(s) will always return False , so we can add it as a condition to ensure it runs for every combinations() call.)
We will pre-generate a list of lengths so that we can also shuffle this.
This is not entirely random (there will still be order - all elements of length n will cluster together, for example, and these elements will be in order, depending on the random order of input), but there will be a lot of randomness if this is enough for you.
Output Example:
>>> list(random_powerset(range(3))) [(), (2,), (0,), (1,), (2, 1), (2, 0), (1, 0), (1, 2, 0)] >>> list(random_powerset(range(3))) [(), (0, 1), (0, 2), (1, 2), (0, 1, 2), (2,), (0,), (1,)] >>> list(random_powerset(range(3))) [(0, 1, 2), (2,), (1,), (0,), (0, 2), (0, 1), (2, 1), ()] >>> list(random_powerset(range(3))) [(1, 2, 0), (0,), (2,), (1,), (), (0, 1), (0, 2), (1, 2)] >>> list(random_powerset(range(3))) [(), (2, 1), (2, 0), (1, 0), (0,), (2,), (1,), (2, 1, 0)] >>> list(random_powerset(range(3))) [(1, 0), (1, 2), (0, 2), (0, 2, 1), (), (1,), (0,), (2,)] I think the best you can do without making it lazy.
Here's another idea: Store the combination generators and make them randomly until you consume everything. It also randomizes the order of the given sizes.
Change I assume that you do not care about the order of the elements in the same set, since you sum them up. If you do, you can put random.shuffle(next_value) before exiting.
import itertools import random def random_powerset(l): combs = [itertools.combinations(l,i) for i in range(len(l)+1)] while combs: comb_index = random.choice(range(len(combs))) try: next_value = next(combs[comb_index]) yield next_value except StopIteration: combs.pop(comb_index) Conclusion:
In : list(random_powerset(range(3))) Out: [(0, 1), (0, 2), (0, 1, 2), (1, 2), (), (0,), (1,), (2,)] In : list(random_powerset(range(3))) Out: [(0, 1, 2), (0,), (), (0, 1), (1,), (0, 2), (1, 2), (2,)] In : list(random_powerset(range(3))) Out: [(0, 1), (0, 1, 2), (0, 2), (), (0,), (1,), (1, 2), (2,)] In : list(random_powerset(range(3))) Out: [(), (0,), (0, 1), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)] In : list(random_powerset(range(3))) Out: [(), (0, 1), (0,), (0, 1, 2), (1,), (0, 2), (2,), (1, 2)] In : list(random_powerset(range(3))) Out: [(0, 1), (0,), (0, 2), (1, 2), (), (1,), (2,), (0, 1, 2)] In : list(random_powerset(range(3))) Out: [(), (0, 1, 2), (0,), (1,), (2,), (0, 1), (0, 2), (1, 2)] You can improve the Lattyware solution a little if you go beyond itertools.chain :
def chain_random(iterables): iterables = list(iterables) icount = len(iterables) if icount == 0: return while icount > 1: shuffle(iterables) try: yield iterables[-1].next() except StopIteration: iterables.pop() icount -= 1 for element in iterables[0]: yield element def random_powerset(iterable): s = list(iterable) lengths = list(range(len(s)+1)) shuffle(lengths) return chain_random(combinations(s, r) for r in lengths if not shuffle(s)) Output Example:
>>> list(random_powerset(range(3))) [(), (2, 1, 0), (1, 0), (1, 2), (2,), (0, 2), (1,), (0,)] >>> list(random_powerset(range(3))) [(1, 0), (1, 2), (0, 2, 1), (2,), (), (0, 2), (0,), (1,)] >>> list(random_powerset(range(3))) [(0, 1), (), (0, 2), (0,), (1, 2), (2, 0, 1), (1,), (2,)] >>> list(random_powerset(range(3))) [(), (1, 2), (2,), (1, 0), (0,), (2, 0), (1,), (1, 0, 2)] >>> list(random_powerset(range(3))) [(0, 1), (), (2,), (0, 2), (1, 2), (1,), (1, 2, 0), (0,)] >>> list(random_powerset(range(3))) [(0, 2, 1), (0,), (), (2, 0), (1,), (2, 1), (2,), (0, 1)] itertools written in C, so chain_random will be slower than itertools.chain . But you get more randomization this way.
This is a lazy and random solution:
import random def powerset(seq): n = 2**len(seq) used = set([]) while len(used) < n: choice = random.randint(0, n - 1) if not (choice in used): used.add(choice) binary = bin(choice)[2:].zfill(len(seq)) yield [i[1] for i in zip(binary, seq) if i[0] == '1'] #or following line if > python 2.7: #yield itertools.compress(seq, binary) print list(powerset([1,2,3])) print list(powerset([1,2,3])) #output: [[3], [1], [2, 3], [], [1, 2], [2], [1, 3], [1, 2, 3]] [[2, 3], [1, 3], [1], [1, 2, 3], [1, 2], [2], [3], []] If you consider the combination [1, 2, 3] in binary format:
n 123 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 Each combination can be marked with a unique binary identifier. And there are always 2**len(seq) combinations .... So:
- Randomly select an integer,
0and2**len(seq) - 1. - check that we have not used it before (if we have, draw again).
- convert it to binary.
- zip using
seq. - if zipped binary digits are
'0', we exclude it from the output.
This is lazy and will work for large seq .
A little caution:
A problem may occur, but that probably doesn't matter to you. Toward the end of the sequence, you may encounter difficulty re-redrawing (which may take some time). Since the probability of drawing an already drawn number is number of successful draws / 2**len(seq) , therefore, in this figure, g expected number of draws for finding an unused new number:
n / (n - g) #where n = 2**len(seq) This is normal if: n is small or for large n : g << n (both or both of these situations are very likely, so neither of them should be a big problem). In fact, with large n you can do without used and checking for repetitions in general, since the expected number of iterations before the first repetition approaches n**0.5 .