How can I improve my algorithm for generating multiset combinations?

How to optimize the next() and hasNext() methods in the next generator that creates limited multiset combinations? (I posted this in C ++, as well as Java, because the code is compatible with C ++ and does not have Java-specific elements that cannot be converted directly to C ++.

Specific areas of the algorithm that are problematic is the whole hasNext() method, which can be unnecessarily complex, and the line:

if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;

which has an if statement, I think you can somehow delete it. I had an earlier version of the algorithm that had some backreference before the return statement and therefore had a much simpler hasNext() check, but I could not get this version to work.

The background of this algorithm is that it is very difficult to find. For example, in Knuth 7.2.1.3, he simply says that this can be done (and gives an exercise to prove that an algorithm is possible), but does not give an algorithm. Similarly, I have half a dozen extended texts on combinatorics (including Papadimitriou and Kreher / Stimson), and none of them provide an algorithm for generating multiset combinations. Kreher leaves this as an β€œexercise for the reader." In any case, if you can improve the algorithm as described above, or give a link to a working implementation that is more efficient than mine, I would appreciate it. Please give only iterative algorithms (no recursion, please).

 /** The iterator returns a 1-based array of integers. When the last combination is reached hasNext() will be false. * @param aiItems One-based array containing number of items available for each unique item type where aiItems[0] is the number of item types * @param ctSlots The number of slots into which the items go * @return The iterator which generates the 1-based array containing the combinations or null in the event of an error. */ public static java.util.Iterator<int[]> combination( final int[] aiItems, final int ctSlots ){ // multiset combination into a limited number of slots CombinatoricIterator<int[]> iterator = new CombinatoricIterator<int[]>(){ int xSlot; int xItemType; int ctItemType; int[] current = new int[ctSlots + 1]; int[] aiItemsUsed = new int[aiItems[0] + 1]; { reset(); current[0] = ctSlots; ctItemType = aiItems[0]; } public boolean hasNext(){ int xUseSlot = ctSlots; int iCurrentType = ctItemType; int ctItemsUsed = 0; int ctTotalItemsUsed = 0; while( true ){ int xUsedType = current[xUseSlot]; if( xUsedType != iCurrentType ) return true; ctItemsUsed++; ctTotalItemsUsed++; if( ctTotalItemsUsed == ctSlots ) return false; if( ctItemsUsed == aiItems[xUsedType] ){ iCurrentType--; ctItemsUsed = 0; } xUseSlot--; } } public int[] next(){ while( true ){ while( xItemType == ctItemType ){ xSlot--; xItemType = current[xSlot]; } xItemType++; while( true ){ while( aiItemsUsed[xItemType] == aiItems[xItemType] && xItemType != current[xSlot] ){ while( xItemType == ctItemType ){ xSlot--; xItemType = current[xSlot]; } xItemType++; } if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--; current[xSlot] = xItemType; aiItemsUsed[xItemType]++; if( xSlot == ctSlots ){ return current; } xSlot++; } } } public int[] get(){ return current; } public void remove(){} public void set( int[] current ){ this.current = current; } public void setValues( int[] current ){ if( this.current == null || this.current.length != current.length ) this.current = new int[current.length]; System.arraycopy( current, 0, this.current, 0, current.length ); } public void reset(){ xSlot = 1; xItemType = 0; Arrays.fill( current, 0 ); current[0] = ctSlots; Arrays.fill( aiItemsUsed, 0 ); aiItemsUsed[0] = aiItems[0]; } }; return iterator; } 

ADDITIONAL INFORMATION

Some of the respondents do not yet understand the difference between multitude and limited multiset. A limited multiset has repeating elements. For example, {a, a, b, b, b, c} is a limited multiset that will be encoded as {3, 2, 3, 1} in my algorithm. Note that the leading β€œ3” is the number of element types (unique elements) in the set. If you provide an algorithm, then the next test should produce the output, as shown below.

  private static void combination_multiset_test(){ int[] aiItems = { 4, 3, 2, 1, 1 }; int iSlots = 4; java.util.Iterator<int[]> iterator = combination( aiItems, iSlots ); if( iterator == null ){ System.out.println( "null" ); System.exit( -1 ); } int xCombination = 0; while( iterator.hasNext() ){ xCombination++; int[] combination = iterator.next(); if( combination == null ){ System.out.println( "improper termination, no result" ); System.exit( -1 ); } System.out.println( xCombination + ": " + Arrays.toString( combination ) ); } System.out.println( "complete" ); } 1: [4, 1, 1, 1, 2] 2: [4, 1, 1, 1, 3] 3: [4, 1, 1, 1, 4] 4: [4, 1, 1, 2, 2] 5: [4, 1, 1, 2, 3] 6: [4, 1, 1, 2, 4] 7: [4, 1, 1, 3, 4] 8: [4, 1, 2, 2, 3] 9: [4, 1, 2, 2, 4] 10: [4, 1, 2, 3, 4] 11: [4, 2, 2, 3, 4] complete 
+7
source share
3 answers

EDIT : customize the answer according to the clarified question

The main idea: again, the resulting selection can be encoded in the same way as the special numeral system. One could increase the counter and interpret this counter as a choice.

However, since there is an additional limit on the size of the allocation == target . A naive way to implement the constraint is to simply check the size of the resulting selection and skip those that do not satisfy the constraints. But it is slow.

So, all I did was make a slightly smarter increment that jumps the selection with the correct size directly.

Sorry, the code is in Python. But I did it the same way as the Java iterator interface. Input and output format:

 haves[i] := multiplicity of the i-th item in the collection target := output collection must have this size 

The code:

 class Perm(object): def __init__(self,items,haves,target): assert sum(haves) >= target assert all(h > 0 for h in haves) self.items = items self.haves = haves self.target = target self.ans = None self.stop = False def __iter__(self): return self def reset(self): self.ans = [0]*len(self.haves) self.__fill(self.target) self.stop = False def __fill(self,n): """fill ans from LSB with n bits""" if n <= 0: return i = 0 while n > self.haves[i]: assert self.ans[i] == 0 self.ans[i] = self.haves[i] n -= self.haves[i] i += 1 assert self.ans[i] == 0 self.ans[i] = n def __inc(self): """increment from LSB, carry when 'target' or 'haves' constrain is broken""" # in fact, the 'target' constrain is always broken on the left most non-zero entry # find left most non-zero i = 0 while self.ans[i] == 0: i += 1 # set it to zero l = self.ans[i] self.ans[i] = 0 # do increment answer, and carry while True: # increment to the next entry, if possible i += 1 if i >= len(self.ans): self.stop = True raise StopIteration # if self.ans[i] == self.haves[i]: l += self.ans[i] self.ans[i] = 0 else: l -= 1 self.ans[i] += 1 break return l def next(self): if self.stop: raise StopIteration elif self.ans is None: self.reset() else: l = self.__inc() self.__fill(l) return self.ans 

Note that the items argument is not used.

assert in __init__ is to exclude my suggestion about input.

assert in __fill should just show the convenient self.ans property in the context invoked by __fill .

Here is a good skeleton for code testing:

 test_cases = [([3,2,1], 3), ([3,2,1], 5), ([3,2,1], 6), ([4,3,2,1,1], 4), ([1,3,1,2,4], 4), ] P = Perm(None,*test_cases[-1]) for p in P: print p #raw_input() 

Result from input ([1,3,1,2,4], 4) :

 [1, 3, 0, 0, 0] [1, 2, 1, 0, 0] [0, 3, 1, 0, 0] [1, 2, 0, 1, 0] [0, 3, 0, 1, 0] [1, 1, 1, 1, 0] [0, 2, 1, 1, 0] [1, 1, 0, 2, 0] [0, 2, 0, 2, 0] [1, 0, 1, 2, 0] [0, 1, 1, 2, 0] [1, 2, 0, 0, 1] [0, 3, 0, 0, 1] [1, 1, 1, 0, 1] [0, 2, 1, 0, 1] [1, 1, 0, 1, 1] [0, 2, 0, 1, 1] [1, 0, 1, 1, 1] [0, 1, 1, 1, 1] [1, 0, 0, 2, 1] [0, 1, 0, 2, 1] [0, 0, 1, 2, 1] [1, 1, 0, 0, 2] [0, 2, 0, 0, 2] [1, 0, 1, 0, 2] [0, 1, 1, 0, 2] [1, 0, 0, 1, 2] [0, 1, 0, 1, 2] [0, 0, 1, 1, 2] [0, 0, 0, 2, 2] [1, 0, 0, 0, 3] [0, 1, 0, 0, 3] [0, 0, 1, 0, 3] [0, 0, 0, 1, 3] [0, 0, 0, 0, 4] 

Performance Each call to next() takes O(h) , where h is the number of element types ( haves list size).

+1
source

I would write a simple helper class that does increment , highbit and for_each_bit .

First I wrapped the unsigned int and limited it to 32 bits and maybe expanded it to std::bitset or std::vector<uint32_t> if I felt ambitious - but if you have these 3 methods, I can check it out and make it work.

increment easy, esp on a bare 32-bit int.

highbit returns the bit position of the most highbit bit.

for_each_bit has this signature in C ++:

 template<typename Lambda> void for_each_bit( my_bignum const& num, Lambda&& func ) 

and it calls func with the index of each set bit in num .

It takes no more than a few minutes to write.

Drop hasNext , follow the concept of an iterator - you have a subset of begin and a subset of end , and end not valid for retrieving a value. Separating these iterators calls the corresponding subset (or creates a factory for the specified subset).

end now easy to work out - if highbit has> = the number of elements in your set, you are at the end of the permutations.

begin either zero or 1, depending on whether you want to include empty subsets.

next just increases your bignum .

Creating a subset simply involves calling for_each_bit and putting that element from your set into the subset.

Then improve increment to allow random access, and then you can iterate over parallel subsets!

This solves a given problem. To solve the mutttiset problem, first solve the problem with the derived set (where you pretend there are only 0 or 1 of each element), and repeat this. Then, at each iteration of the derived set, create a std::vector maximum number of each element.

Then do something like this:

 #include <utility> #include <cstddef> #include <vector> using std::size_t; namespace details { template<typename Lambda> void for_each_multiset_combo_worker( std::vector<size_t> const& counts, Lambda&& lambda, std::vector<size_t>& indexes, std::vector<size_t>& current ) { if (depth >= counts.size()) { lambda( current ); return; } for (size_t i = 0; i <= counts[depth]; ++i) { // Assert: current.size() == depth current.push_back(i); // Assert: current.back() == i // Assert: current.size() == dpeth+1 for_each_multiset_combo_worker( counts, lambda, depth+1, current ); // Assert: current.back() == i // Assert: current.size() == dpeth+1 current.pop_back(); // Assert: current.size() == depth } } } template<typename Lambda> void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda ) { std::vector<size_t> current; current.reserve( counts.size() ); details::for_each_multiset_combo_worker( counts, std::forward<Lambda>(lambda), 0, current ); } #include <iostream> int main() { std::vector<size_t> multiset = {3, 2, 1, 1}; size_t counter = 0; for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){ std::cout << counter << ": ["; for(auto it = counts.begin(); it != counts.end(); ++it) { if (it != counts.begin()) { std::cout << ", "; } std::cout << *it; } std::cout << "]\n"; ++counter; }); } 

Real-time example: http://ideone.com/8GN1xx

In this living example, I first skipped the optimization of iterating over the set, and instead went straight through the multiset.

(Limitations: no more than max size_t elements of each type and no more than the maximum capacity std::vector different types of elements).

I do not need to keep the "number of individual elements in the multiset", so I did not use it.

And here is an iterative version of the aforementioned recursive algorithm, using the usual method of "turning stacks of implicit recursion into an explicit iteration method":

 #include <utility> #include <cstddef> #include <vector> using std::size_t; template<typename Lambda> void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda ) { // below code is easier if I assume counts is non-empty: if (counts.empty()) { lambda(counts); return; } // preallocate a buffer big enough to hold the output counts: std::vector<size_t> indexes; indexes.reserve( counts.size() ); while(true) { // append 0s on the end of indexes if we have room: while (indexes.size() < counts.size()) { indexes.push_back(0); } // at this point, we have a unique element. Pass it to the passed in lambda: lambda( indexes ); // The advancement logic. Advance the highest index. If that overflows, pop it and // advance the next highest index: indexes.back()++; while (indexes.back() > counts[indexes.size()-1]) { indexes.pop_back(); // we are done if we have managed to advance every index, and there are none left to advance: if (indexes.empty()) return; // finished indexes.back()++; } } } #include <iostream> int main() { std::vector<size_t> multiset = {3, 2, 1, 1}; size_t counter = 0; for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){ std::cout << counter << ": ["; for(auto it = counts.begin(); it != counts.end(); ++it) { if (it != counts.begin()) { std::cout << ", "; } std::cout << *it; } std::cout << "]\n"; ++counter; }); } 

http://ideone.com/x2Zp2f

+1
source

This document provides an efficient iterative algorithm for generating multi-position permutations on page 8

This article provides another iterative algorithm, also on page 8.

0
source

All Articles