The fastest way to find unique list combinations

Question

The fastest way to find unique list combinations

I am trying to solve the general problem of getting unique combinations from a list in Python

Mathematically from https://www.mathsisfun.com/combinatorics/combinations-permutations-calculator.html I see that the formula for the number of combinations is n!/r!(nr)! , where n is the length of the sequence and r is the number to be selected.

As shown in the following python, where n is 4 and r is 2:

 lst = 'ABCD' result = list(itertools.combinations(lst, len(lst)/2)) print len(result) 6

Below is a helper function to display the problem that I have:

 def C(lst): l = list(itertools.combinations(sorted(lst), len(lst)/2)) s = set(l) print 'actual', len(l), l print 'unique', len(s), list(s)

If I ran this from iPython, I can name it like this:

 In [41]: C('ABCD') actual 6 [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')] unique 6 [('B', 'C'), ('C', 'D'), ('A', 'D'), ('A', 'B'), ('A', 'C'), ('B', 'D')] In [42]: C('ABAB') actual 6 [('A', 'A'), ('A', 'B'), ('A', 'B'), ('A', 'B'), ('A', 'B'), ('B', 'B')] unique 3 [('A', 'B'), ('A', 'A'), ('B', 'B')] In [43]: C('ABBB') actual 6 [('A', 'B'), ('A', 'B'), ('A', 'B'), ('B', 'B'), ('B', 'B'), ('B', 'B')] unique 2 [('A', 'B'), ('B', 'B')] In [44]: C('AAAA') actual 6 [('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A')] unique 1 [('A', 'A')]

What I want to get is a unique count, as shown above, but the execution of combinations and then set does not scale. Since, when the length lst , which n increases, it slows down as the combinations get bigger and bigger.

Is there a way to use math or Python tricks to solve the problem of counting unique combinations?

+7

python math

sotapme Feb 03 '18 at 22:26

source share

2 answers

Start with a regular recursive definition of combinations (), but add a test only for recursion when the initial value at this level has not been used before:

 def uniq_comb(pool, r): """ Return an iterator over a all distinct r-length combinations taken from a pool of values that may contain duplicates. Unlike itertools.combinations(), element uniqueness is determined by value rather than by position. """ if r: seen = set() for i, item in enumerate(pool): if item not in seen: seen.add(item) for tail in uniq_comb(pool[i+1:], r-1): yield (item,) + tail else: yield () if __name__ == '__main__': from itertools import combinations pool = 'ABRACADABRA' for r in range(len(pool) + 1): assert set(uniq_comb(pool, r)) == set(combinations(pool, r)) assert dict.fromkeys(uniq_comb(pool, r)) == dict.fromkeys(combinations(pool, r))

+3

Raymond hettinger Feb 03 '18 at 23:10

source share

Mark dickinson · Accepted Answer · 2018-02-04T20:10:01+0000

Here's the Python code based on the generating function method described in this Math Forum article . For each letter appearing at the input, we create a polynomial 1 + x + x^2 + ... + x^k , where k is the number of times the letter appears. Then we multiply these polynomials together: the nth coefficient of the resulting polynomial then tells you how many combinations of length n are.

We will represent the polynomial simply as a list of its (integer) coefficients with the first coefficient representing a constant term, the next coefficient representing the coefficient at x , and so on. We will need to be able to multiply such polynomials, so here for this function:

 def polymul(p, q): """ Multiply two polynomials, represented as lists of coefficients. """ r = [0]*(len(p) + len(q) - 1) for i, c in enumerate(p): for j, d in enumerate(q): r[i+j] += c*d return r

In the above example, the following function calculates the number of combinations:

 from collections import Counter from functools import reduce def ncombinations(it, k): """ Number of combinations of length *k* of the elements of *it*. """ counts = Counter(it).values() prod = reduce(polymul, [[1]*(count+1) for count in counts], [1]) return prod[k] if k < len(prod) else 0

Testing this example:

 >>> ncombinations("abcd", 2) 6 >>> ncombinations("abab", 2) 3 >>> ncombinations("abbb", 2) 2 >>> ncombinations("aaaa", 2) 1

And on some longer examples, demonstrating that this approach is possible even for long-tailed inputs:

 >>> ncombinations("abbccc", 3) # the math forum example 6 >>> ncombinations("supercalifragilisticexpialidocious", 10) 334640 >>> from itertools import combinations # double check ... >>> len(set(combinations(sorted("supercalifragilisticexpialidocious"), 10))) 334640 >>> ncombinations("supercalifragilisticexpialidocious", 20) 1223225 >>> ncombinations("supercalifragilisticexpialidocious", 34) 1 >>> ncombinations("supercalifragilisticexpialidocious", 35) 0 >>> from string import printable >>> ncombinations(printable, 50) # len(printable)==100 100891344545564193334812497256 >>> from math import factorial >>> factorial(100)//factorial(50)**2 # double check the result 100891344545564193334812497256 >>> ncombinations("abc"*100, 100) 5151 >>> factorial(102)//factorial(2)//factorial(100) # double check (bars and stars) 5151

The fastest way to find unique list combinations

More articles: