Effective enumeration of subsets

I am currently working on an algorithm for a mathematical optimization problem and must deal with the following situation.

In many situations, the algorithm must decide which subset S ⊂ N is the best in this situation.
N = {0, 1, 2, ..., 126, 127}
| S | ∈ {0, 1, 2, 3, 4, 5} (the size of the subset between 0 and 5)

This gives a total number of possible subsets of 265.982.833. (bin (128, 5) + bin (128, 4) + ... + bin (128, 0))

If I preliminarily calculate all possible subsets and store them in an array, then this array will have 265.982.833 records and a memory capacity of about 1.27 GB without any optimizations and naive storage of the subsets as byte arrays.

In this case, when the algorithm needs to know which elements are in a certain subset with index i, only a table search is required. However, huge memory requirements are unacceptable.

So, my question is basically, if anyone can think of an algorithm to efficiently calculate elements in a subset based on an index, instead of requiring a pre-computed array.


EDIT included samples:
lookupTable [0] = {}
lookupTable [1] = {0}
...
lookupTable [127] = {126}
lookupTable [128] = {127}
lookupTable [129] = {0, 1}
lookupTable [130] = {0, 2}
...
lookupTable [265982832] = {123, 124, 125, 126, 127}

+4
source share
2 answers

Directly construct each subset of the previous subset. It is also easy to imagine the subset as a 128-bit number (although, obviously, most of the values ​​will not be mapped to the qualification subset, and I don’t know if the value of 128 in the question was real or just an example.) This is of course the first approach I would use ; if it works, all O (1) and storage costs are not extreme (16 bytes for indexes instead of 4).

If you really want to store short indexes for subsets, I would use the following recursion, where S (n, k) represents all the subsets (or subset counts) of size & le; k of values ​​<n

s(n,0) = { {} }
s(n,k) = (s(n-1,k-1) &odot; {n}) &Union; s(n-1,k) if n ≥ k > 0
s(n,k) = {} if n < k

where is the operator P &odot; S P &odot; S means “Add S to each element of P ” (and therefore the result will be exactly the same size as P ). So, considered as a counting algorithm, we get:

S(n,0) = 1
S(n,k) = S(n-1,k-1) + S(n-1,k) if n ≥ k > 0
S(n,k) = 0 if n < k

The second recursion can be re-expressed as:

S(n,k) = Σ n i=k S(i-1,k-1)
(which would look better with jsMath, grrr.)

Another way to say that we will generate sets in order by the largest element, so we start with the set {0...k-1} , then all sets with {k} as the largest element, then all those {k+1} etc. In each group of sets, we recursively find a series with (k-1) , start again with the lowest maximum value and work up to one less than the maximum value that we just selected.

So, we can find out what is the largest value in the indexed set for the index in S(n,k) by sequentially subtracting S(i-1, k-1) for i from k to n until the result is negative; then add {i} to the result set; decrease k by 1 and repeat with n now set to i-1 .

If we pre-copy the corresponding tables S(n,k) , of which there are about 640 real combinations, we can use the binary search instead of iteration to find i at each step, so the calculation takes time k log(n) , which is not terrible .

+5
source

A naive implementation will use bitmap (bitX == 1, meaning that the X element is present in the set). An additional limitation would be that no more than 5 bits in the mask can be one. And 128 bits are required to represent the set.

Using the product of examples to represent the set, you only need <64 bits per set (124 ... 128'th examples (124: 691, 125: 701, 126: 709, 127: 719, 128: 727}, and their product will fit 64 bits, IICC. It will still squander some bits (a good enumeration will fit in 32 bits, as OQ showed), but it’s easy to check the “overlapping” common elements of the two sets with their GCD.

Both methods need an array of values ​​to sort and use the rank of the set inside this array as its enumeration value.

0
source

All Articles