Effective combinations of N color elements with a limited number of colors

Given a set of N elements colored with C, how can I find any possible combination of length L that contains at most M colors?

I tried this algorithm, which uses itertools.combinations to create all possible combinations, and then filters out those that do not support the maximum color conformation.

from itertools import combinations as cb def allowed_combinations(elements, combination_size=4, max_colors=3): colors = set([c for k, c in elements.items()]) combinations = cb(elements, combination_size) for combination in combinations: colors = set([elements[element] for element in combination]) if len(colors) > max_colors: continue yield combination elements = dict() elements['A'] = 'red' elements['B'] = 'red' elements['C'] = 'blue' elements['D'] = 'blue' elements['E'] = 'green' elements['F'] = 'green' elements['G'] = 'green' elements['H'] = 'yellow' elements['I'] = 'white' elements['J'] = 'white' elements['K'] = 'black' combinations = allowed_combinations(elements) for c in combinations: for element in c: print("%s-%s" % (element, elements[element])) print "\n" 

The output is as follows:

 A-red C-blue B-red E-green A-red C-blue B-red D-blue A-red C-blue B-red G-green A-red C-blue B-red F-green ... 

The problem is that generating all possible combinations can be very expensive. In my case, for example, L is often 6, and the number of elements N is 50, so it gives us Bin (50.6) = 15890700 possible combinations. If the maximum number of colors allowed during compilation is small, most combinations are "useless", and therefore they are discarded at the filtering stage. My intuition is that I have to put the filtering step inside / before the combinatorial step to avoid decomposing the combinations, but I do not see how.

+7
python algorithm combinations combinatorics
source share
3 answers

Combinatorial problems are known for being easy to formulate, but difficult to solve. For this, I would not use itertools at all, but instead wrote a custom generator. For example,

 def combs(elt2color, combination_size=4, max_colors=3): def inner(needed, index): if needed == 0: yield result return if n - index < needed: # not enough elements remain to reach # combination_size return # first all results that don't contain elts[index] for _ in inner(needed, index + 1): yield result # and then all results that do contain elts[index] needed -= 1 elt = elts[index] color = elt2color[elt] color_added = color not in colors_seen colors_seen.add(color) if len(colors_seen) <= max_colors: result[needed] = elt for _ in inner(needed, index + 1): yield result if color_added: colors_seen.remove(color) elts = tuple(elt2color) n = len(elts) colors_seen = set() result = [None] * combination_size for _ in inner(combination_size, 0): yield tuple(result) 

Then:

 elt2color = dict([('A', 'red'), ('B', 'red'), ('C', 'blue'), ('D', 'blue'), ('E', 'green'), ('F', 'green'), ('G', 'green'), ('H', 'yellow'), ('I', 'white'), ('J', 'white'), ('K', 'black')]) for c in combs(elt2color): for element in c: print("%s-%s" % (element, elements[element])) print "\n" 

produces the same combinations 188 as your post-processing code, but internally refuses partial combinations as soon as it spans more max_colors colors. It is not possible to change which itertools functions execute internally, so when you want to control this, you need to flip your own.

Using itertools

Here is another approach, generating first all solutions with exactly 1 color, then exactly 2 colors, etc. itertools can be used directly for most of this, but at the lowest level, a custom generator is still needed. I find it harder to understand than a fully custom generator, but it may be clearer for you:

 def combs2(elt2color, combination_size=4, max_colors=3): from collections import defaultdict from itertools import combinations color2elts = defaultdict(list) for elt, color in elt2color.items(): color2elts[color].append(elt) def at_least_one_from_each(iterables, n): if n < len(iterables): return # impossible if not n or not iterables: if not n and not iterables: yield () return # Must have n - num_from_first >= len(iterables) - 1, # so num_from_first <= n - len(iterables) + 1 for num_from_first in range(1, min(len(iterables[0]) + 1, n - len(iterables) + 2)): for from_first in combinations(iterables[0], num_from_first): for rest in at_least_one_from_each(iterables[1:], n - num_from_first): yield from_first + rest for numcolors in range(1, max_colors + 1): for colors in combinations(color2elts, numcolors): # Now this gets tricky. We need to pick # combination_size elements across all the colors, but # must pick at least one from each color. for elements in at_least_one_from_each( [color2elts[color] for color in colors], combination_size): yield elements 

I didn’t time them because I don’t care ;-) A fully customizable generator single result list is used repeatedly to build each output, which reduces the speed of dynamic memory. The second method creates a lot of memory garbage by combining several levels of from_first and rest tuples - and this is mostly inevitable because it uses itertools to generate from_first tuples at each level.

Inside the itertools function, itertools almost always work in a way that is more like the first code example, and for the same reasons, reuses the internal buffer as much as possible.

AND ONE MORE

This illustrates some more subtleties. I was thinking about what I would do if I implemented this functionality in C as a function of itertools . All functions of itertools were first prototyped in Python, but at a semi-low level, they come down to working with vectors of small integers (lack of an “inner loop” for using sets, dicts, slicing a sequence, or gluing a partial result of a sequence - sticking as much as possible to O(1) worst temporary operations with dirty simple native C types after initialization).

At a higher level, the itertools function would allow any iterative value as the main argument, and would almost certainly guarantee the return of combinations from a number in the lexicographic index order. So here is the code that does all this. In addition to the iterable argument, it also requires an elt2ec , which maps each element from iterable to its equivalence class (for you these are color-naming strings, but any objects used as dict keys can be used as equivalence classes):

 def combs3(iterable, elt2ec, k, maxec): # Generate all k-combinations from `iterable` spanning no # more than `maxec` equivalence classes. elts = tuple(iterable) n = len(elts) ec = [None] * n # ec[i] is equiv class ordinal of elts[i] ec2j = {} # map equiv class to its ordinal for i, elt in enumerate(elts): thisec = elt2ec[elt] j = ec2j.get(thisec) if j is None: j = len(ec2j) ec2j[thisec] = j ec[i] = j countec = [0] * len(ec2j) del ec2j def inner(i, j, totalec): if i == k: yield result return for j in range(j, jbound[i]): thisec = ec[j] thiscount = countec[thisec] newtotalec = totalec + (thiscount == 0) if newtotalec <= maxec: countec[thisec] = thiscount + 1 result[i] = j yield from inner(i+1, j+1, newtotalec) countec[thisec] = thiscount jbound = list(range(n-k+1, n+1)) result = [None] * k for _ in inner(0, 0, 0): yield (elts[i] for i in result) 

(Note that this is Python 3 code.) As announced, nothing in inner() is different from indexing a vector with a small integer. The only thing left to do to make it directly translate to C is delete the recursive generation. This is tiring, and since it does not illustrate anything particularly interesting here, I will ignore it.

In any case, the interesting thing is time. As noted in the comment, the synchronization results are highly dependent on the test cases you use. combs3() here the fastest, but not often! This is almost always faster than my original combs() , but usually slower than my combs2() or @GarethRees beautiful constrained_combinations() .

So, how can it be when combs3() been optimized "almost completely down to meaningless ;-) C-level operations"? Easy! It is still written in Python. combs2() and constrained_combinations() use C-coded itertools.combinations() to do most of their work, and that makes a world of difference. combs3() will run circles around them if they were encoded in C.

Of course, any of them can work unlimitedly faster than allowed_combinations() in the original message, but it can be the fastest (for example, select almost any inputs where max_colors so large that no combinations are excluded - then allowed_combinations() little effort and everyone else adds additional substantial additional overhead to “optimize” the cropping that never occurs).

+3
source share

Here's an implementation that is slightly simpler than the other answers published so far. The basic approach is as follows:

  • Select a value ("color" in your terminology) that is not yet selected;
  • Loop over i , the number of keys ("elements") associated with this value to be included in the output;
  • End cycle c , combinations of these keys of length i ;
  • Count to select the next value.
 from collections import defaultdict, deque from itertools import combinations def constrained_combinations(elements, r, s): """Generate distinct combinations of 'r' keys from the dictionary 'elements' using at most 's' different values. The values must be hashable. >>> from collections import OrderedDict >>> elements = OrderedDict(enumerate('aabbc')) >>> cc = constrained_combinations >>> list(cc(elements, 2, 1)) [(0, 1), (2, 3)] >>> list(cc(elements, 3, 2)) [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (1, 2, 3), (2, 3, 4)] >>> list(cc(elements, 3, 3)) == list(combinations(range(5), 3)) True >>> sum(1 for _ in cc(OrderedDict(enumerate('aabbcccdeef')), 4, 3)) 188 """ # 'value_keys' is a map from value to a list of keys associated # with that value; 'values' is a list of values in reverse order of # first appearance. value_keys = defaultdict(list) values = deque() for k, v in elements.items(): if v not in value_keys: values.appendleft(v) value_keys[v].append(k) def helper(current, r, s): if r == 0: yield current return if s == 0 or not values: return value = values.pop() keys = value_keys[value] for i in range(min(r, len(keys)), -1, -1): for c in combinations(keys, i): for result in helper(current + c, r - i, s - min(i, 1)): yield result values.append(value) return helper((), r, s) 

Notes

  • In Python 3.3 or later, you can use the yield from to simplify the recursive call:

     yield from helper(current + c, r - i, s - min(i, 1)) 
  • If you're wondering why doctests use collections.OrderedDict so that combinations can be returned in the predictable order that is needed for testing.

  • The code changes the list of values and iterates down i so that if the caller goes to OrderedDict , the combinations are returned in a reasonable order (with values ​​that appear at an early stage, input also appears at an early stage of output).

  • Given the slight awkwardness in getting the predicted result from this function, I think it’s worth considering changing the interface so that instead of taking keys for matching dictionaries with values, it would require iterability (key, value).

Performance

In a broad sense, this is similar to Tim Peter combs2 :

 >>> from timeit import timeit >>> elements = dict(enumerate('abcde' * 10)) >>> test = lambda f:timeit(lambda:sum(1 for _ in f(elements, 6, 3)), number=1) >>> test(combs2) 11.403807007009163 >>> test(constrained_combinations) 11.38378801709041 
+4
source share

Rough outline.

You have a total of C different colors. For each k, 1 <= k <= M select the colors k in Bin(C,k) ways. (I use your notation here, counting the binomial binomial coefficient).

For each of the options above, collect all the elements with the selected colors. Let say that he gives P various elements. Then select L from these P elements in Bin(P, L) different ways.

All of the above obvious checks, M <= C , L <= P , etc.

The advantage of this approach is that it will only generate valid combinations, and each valid combination will be generated exactly once. (edit: and, as indicated in the comment, this is not true, a duplicate, a combination can be generated).

PS. And here is the implementation of the above algorithm with the correction of duplicate combinations:

 from itertools import combinations elts = { 'A' : 'red', 'B' : 'red', 'C' : 'blue', 'D' : 'blue', 'E': 'green', 'F' : 'green', 'G' : 'green', 'H' : 'yellow', 'I' : 'white', 'J' : 'white', 'K' : 'black' } def combs (elts, size = 4, max_colors = 3): # Count different colors colors = {} for e in elts.values(): colors [e] = 1 ncolors = len(colors) # for each different number of colors between 1 and 'max_colors' for k in range (1, max_colors + 1): # Choose 'k' different colors for selected_colors in combinations (colors, k): # Select ell the elements with these colors selected_elts = [] for e, c in elts.items(): if c in selected_colors: selected_elts.append (e) # Choose 'size' of these elements for chosen_elts in combinations (selected_elts, size): # Check the chosen elements are of exactly 'k' different colors t = {} for e in chosen_elts: t[elts[e]] = 1 if len(t) == k: yield chosen_elts #for e in combs (elts): # print (e) print (len (list (combs (elts)))) 

PS. I also timed Tim comb2 , my own comb and Gareth constrained_combinations with a program here with these results:

 combs2 = 5.214529 constr combs = 5.290079 combs = 4.952063 combs2 = 5165700 constr combs = 5165700 combs = 5165700 
0
source share

All Articles