How to generate all combinations of combinations in random order

Firstly, I’m not even sure that the terminology is correct, since I did not find anything like it (especially since I don’t even know which keywords to use)

Problem: There is a population of people, and I want to assign them to groups. I have a set of rules to give each assignment a rating. I want to find the best (or at least very good).

For example, with a population of four {A,B,C,D} and an assignment to two groups of the two possible assignments are:

 {A,B},{C,D} {A,C},{B,D} {A,D},{B,C} 

And, for example, {B,A},{C,D} and {C,D},{A,B} coincide with the first (I'm not interested in the order within the groups and the order of the groups themselves).

The number of people, the number of groups and the number of people in each group are all inputs.

My idea was to list each possible binding, calculate their score and track the best. That is, brute force. Since the population can be large, I thought about going through them randomly and returning the best one found when the time runs out (probably when the user becomes bored or thinks that this is a good enough find). The population can vary from very small (of the four listed) to really large (maybe 200+), so just trying random ones, not caring for repetitions, breaks down with small ones where brute force is possible (plus I don’t know when to stop, if I used simple random permutations).

The population is large enough that the listing of all the appointments for their shuffling does not fit into the memory. Thus, I need either a method to find all possible assignments in random order, or a method specified by the index, generate the appropriate assignment and use an array of indices and shuffle this (the second would be better, because then I could easily distribute the tasks into several servers).

+6
source share
6 answers

A simple recursive algorithm for creating these pairings is to combine the first element with each of the remaining elements, and for each of these couplings generates all pairs of the remaining elements recursively. For groups, create all groups consisting of the first element and all combinations of the remaining elements, then overwrite the remaining ones.

You can calculate how many possible groups of groups are as follows:

 public static int numGroupingCombinations(int n, int groupSize) { if(n % groupSize != 0) return 0; // n must be a multiple of groupSize int count = 1; while(n > groupSize) { count *= nCr(n - 1, groupSize - 1); n -= groupSize; } return count; } public static int nCr(int n, int r) { int ret = 1; for (int k = 0; k < r; k++) { ret = ret * (nk) / (k+1); } return ret; } 

Therefore, I need either a method to find all possible assignments in a random order, or a method specified by an index, generate the corresponding assignment and use an array of indices and shuffle it (the second one will be better because I can easily distribute tasks to several servers).

To generate a grouping from an index, select a combination of elements to group with the first element, taking the index modulo with the number of possible combinations and creating a combination from the result using this algorithm . Then divide the index by the same number and recursively generate the remainder of the set.

 public static void generateGrouping(String[] elements, int groupSize, int start, int index) { if(elements.length % groupSize != 0) return; int remainingSize = elements.length - start; if(remainingSize == 0) { // output the elements: for(int i = 0; i < elements.length; i += groupSize) { System.out.print("["); for(int j = 0; j < groupSize; j++) System.out.print(((j==0)?"":",")+elements[i+j]); System.out.print("]"); } System.out.println(""); return; } int combinations = nCr(remainingSize - 1, groupSize - 1); // decide which combination of remaining elements to pair the first element with: int[] combination = getKthCombination(remainingSize - 1, groupSize - 1, index % combinations); // swap elements into place for(int i = 0; i < groupSize - 1; i++) { String temp = elements[start + 1 + i]; elements[start + 1 + i] = elements[start + 1 + combination[i]]; elements[start + 1 + combination[i]] = temp; } generateGrouping(elements, groupSize, start + groupSize, index / combinations); // swap them back: for(int i = groupSize - 2; i >= 0; i--) { String temp = elements[start + 1 + i]; elements[start + 1 + i] = elements[start + 1 + combination[i]]; elements[start + 1 + combination[i]] = temp; } } public static void getKthCombination(int n, int r, int k, int[] c, int start, int offset) { if(r == 0) return; if(r == n) { for(int i = 0; i < r; i++) c[start + i] = i + offset; return; } int count = nCr(n - 1, r - 1); if(k < count) { c[start] = offset; getKthCombination(n-1, r-1, k, c, start + 1, offset + 1); return; } getKthCombination(n-1, r, k-count, c, start, offset + 1); } public static int[] getKthCombination(int n, int r, int k) { int[] c = new int[r]; getKthCombination(n, r, k, c, 0, 0); return c; } 

Demo

The start parameter is how far you are on the list, so when calling a function at the top level, pass zero. A function can be easily rewritten as iterative. You can also pass an array of indices instead of an array of objects that you want to group if you exchange objects for large overheads.

+2
source

Say we have a total number N of elements that we want to organize in G groups E (with G * E = N). Neither the order of groups, nor the order of elements within groups matter. The ultimate goal is to create each solution in a random order, knowing that we cannot store each solution at once.

First, think about how to create one solution. Since the order does not matter, we can normalize any solution by sorting the elements within the groups, as well as the groups themselves by their first element.

For example, if we consider the population {A, B, C, D} , with N = 4, G = 2, E = 2 , then the solution {B,D}, {C,A} can be normalized as {A,C}, {B,D} . Items are sorted in each group (A to C), and groups are sorted (A to B).

When decisions are normalized, the first element of the first group is always the first element of the population. The second element is one of the remaining N-1, the third is one of the remaining N-2, etc., Except that these elements must be sorted. Thus, for the first group there are possibilities (N-1)!/((NE)!*(E-1)!) .

Similarly, the first element of the following groups is fixed: they are the first of the remaining elements after each group is created. Thus, the number of possibilities for the (n + 1) th group (n from 0 to G-1) is (N-nE-1)!/((N-(n+1)E)!*(E-1)!) = ((Gn)E-1)!/(((Gn-1)E)!*(E-1)!) .

This gives us one of the possible ways to index the solution. An index is not a single integer, but an array of integers G, the integer n (still from 0 to G-1) is in the range from 1 to (N-nE-1)!/((N-nE-E)!*(E-1)!) And represents the group of the n (or "(n + 1) th groups") solutions. This is easy to do randomly and check for duplicates.

The last thing we need to find is a way to create a group from the corresponding integer n. We need to select the E-1 elements from the remaining N-nE-1. At this point, you can present a list of each combination and select the (n + 1) th number. Of course, this can be done without creating each combination: see this question .

For curiosity, the total number of solutions is (GE)!/(G!*(E!)^G) .
In your example, this is (2*2)!/(2!*(2!)^2) = 3 .
For N = 200 and E = 2, there are solutions of 6.7e186.
For N = 200 and E = 5, there are 6.6e243 solutions (the maximum that I found for 200 elements).

In addition, for N = 200 and E> 13, the number of possibilities for the first group is greater than 2 ^ 64 (therefore, it cannot be stored in a 64-bit integer), which is problematic for representing the index. But as long as you don't need groups with more than 13 elements, you can use arrays of 64-bit integers as indices.

+2
source

What you call "assignments" are sections with a fixed number of parts of the same size. Well, basically. You did not indicate what should happen if (# groups) * (the size of each group) is smaller or larger than the size of your population.

Creating any possible section in a non-specific order is not too complicated, but it is only suitable for small populations or for filtering and finding any section that meets some independent criteria. If you need to optimize or minimize something, you will end up looking at the entire set of sections, which may not be possible.

Based on a description of your actual problem, you want to read local search and optimization , of which the above simulated annealing is one such technique.


With all of the above, here is a simple recursive Python function that generates sections of a fixed length with equal sizes of parts in a specific order. This is a specialization of my answer to a similar section problem, and this answer itself is a specialization of this answer . This should be pretty easy to translate into JavaScript (with ES6 generators).

 def special_partitions(population, num_groups, group_size): """Yields all partitions with a fixed number of equally sized parts. Each yielded partition is a list of length `num_groups`, and each part a tuple of length `group_size. """ assert len(population) == num_groups * group_size groups = [] # a list of lists, currently empty def assign(i): if i >= len(population): yield list(map(tuple, groups)) else: # try to assign to an existing group, if possible for group in groups: if len(group) < group_size: group.append(population[i]) yield from assign(i + 1) group.pop() # assign to an entirely new group, if possible if len(groups) < num_groups: groups.append([population[i]]) yield from assign(i + 1) groups.pop() yield from assign(0) for partition in special_partitions('ABCD', 2, 2): print(partition) print() for partition in special_partitions('ABCDEF', 2, 3): print(partition) 

On execution, this prints:

 [('A', 'B'), ('C', 'D')] [('A', 'C'), ('B', 'D')] [('A', 'D'), ('B', 'C')] [('A', 'B', 'C'), ('D', 'E', 'F')] [('A', 'B', 'D'), ('C', 'E', 'F')] [('A', 'B', 'E'), ('C', 'D', 'F')] [('A', 'B', 'F'), ('C', 'D', 'E')] [('A', 'C', 'D'), ('B', 'E', 'F')] [('A', 'C', 'E'), ('B', 'D', 'F')] [('A', 'C', 'F'), ('B', 'D', 'E')] [('A', 'D', 'E'), ('B', 'C', 'F')] [('A', 'D', 'F'), ('B', 'C', 'E')] [('A', 'E', 'F'), ('B', 'C', 'D')] 
+2
source

Simulated annealing may work. You can start with a suboptimal initial solution and repeat the use of heuristics for improvement.

Your assessment criteria may help you choose an initial solution, for example. make the best first group you can score, and then with what is left, make the best second group, etc.

A good selection of “neighboring states” may be determined by your assessment criteria, but at least you could consider two neighboring states if they differ in one exchange.

Thus, the iterative part of the algorithm will be to try a bunch of swaps selectively selected and choose the one that improves the global estimate in accordance with the annealing schedule.

I hope you can find the best selection of neighboring states! That is, I hope that you can find the best rules for iterative improvement based on your assessment criteria.

+1
source

If you have a sufficiently large population that you cannot put all assignments into memory and are unlikely to ever make all possible assignments, then the easiest method will be to simply select test assignments at random. For instance:

 repeat randomly shuffle population put 1st n/2 members of the shuffled pop into assig1 and 2nd n/2 into assig2 score assignation and record it if best so far until bored 

If you have a large population, it is unlikely that there will be much loss of effectiveness due to duplication of the test, since it is unlikely that you will again get the opportunity to re-assign.

Depending on your assessment rules, it may be more efficient to choose the next check that you want to check, for example, replacing a couple of members between the best assignment found so far, but you did not provide enough information to determine if this is so.

0
source

Below is an approach focused on your optimization problem (and ignoring your permutation based approach).

I will formulate the problem as a mixed-integer-problem and use specialized solvers to calculate good solutions.

Since your problem is not well-formulated, it may require some changes. But a general message: This approach will be hard to beat! .

The code

 import numpy as np from cvxpy import * """ Parameters """ N_POPULATION = 50 GROUPSIZES = [3, 6, 12, 12, 17] assert sum(GROUPSIZES) == N_POPULATION N_GROUPS = len(GROUPSIZES) OBJ_FACTORS = [0.4, 0.1, 0.15, 0.35] # age is the most important """ Create fake data """ age_vector = np.clip(np.random.normal(loc=35.0, scale=10.0, size=N_POPULATION).astype(int), 0, np.inf) height_vector = np.clip(np.random.normal(loc=180.0, scale=15.0, size=N_POPULATION).astype(int), 0, np.inf) weight_vector = np.clip(np.random.normal(loc=85, scale=20, size=N_POPULATION).astype(int), 0, np.inf) skill_vector = np.random.randint(0, 100, N_POPULATION) """ Calculate a-priori stats """ age_mean, height_mean, weight_mean, skill_mean = np.mean(age_vector), np.mean(height_vector), \ np.mean(weight_vector), np.mean(skill_vector) """ Build optimization-model """ # Variables X = Bool(N_POPULATION, N_GROUPS) # 1 if part of group D = Variable(4, N_GROUPS) # aux-var for deviation-norm # Constraints constraints = [] # (1) each person is exactly in one group for p in range(N_POPULATION): constraints.append(sum_entries(X[p, :]) == 1) # (2) each group has exactly n (a-priori known) members for g_ind, g_size in enumerate(GROUPSIZES): constraints.append(sum_entries(X[:, g_ind]) == g_size) # Objective: minimize deviation from global-statistics within each group # (ugly code; could be improved a lot!) group_deviations = [[], [], [], []] # age, height, weight, skill for g_ind, g_size in enumerate(GROUPSIZES): group_deviations[0].append((sum_entries(mul_elemwise(age_vector, X[:, g_ind])) / g_size) - age_mean) group_deviations[1].append((sum_entries(mul_elemwise(height_vector, X[:, g_ind])) / g_size) - height_mean) group_deviations[2].append((sum_entries(mul_elemwise(weight_vector, X[:, g_ind])) / g_size) - weight_mean) group_deviations[3].append((sum_entries(mul_elemwise(skill_vector, X[:, g_ind])) / g_size) - skill_mean) for i in range(4): for g in range(N_GROUPS): constraints.append(D[i,g] >= abs(group_deviations[i][g])) obj_parts = [sum_entries(OBJ_FACTORS[i] * D[i, :]) for i in range(4)] objective = Minimize(sum(obj_parts)) """ Build optimization-problem & solve """ problem = Problem(objective, constraints) problem.solve(solver=GUROBI, verbose=True, TimeLimit=120) # might need to use non-commercial solver here print('Min-objective: ', problem.value) """ Evaluate solution """ filled_groups = [[] for g in range(N_GROUPS)] for g_ind, g_size in enumerate(GROUPSIZES): for p in range(N_POPULATION): if np.isclose(X[p, g_ind].value, 1.0): filled_groups[g_ind].append(p) for g_ind, g_size in enumerate(GROUPSIZES): print('Group: ', g_ind, ' of size: ', g_size) print(' ' + str(filled_groups[g_ind])) group_stats = [] for g in range(N_GROUPS): age_mean_in_group = age_vector[filled_groups[g]].mean() height_mean_in_group = height_vector[filled_groups[g]].mean() weight_mean_in_group = weight_vector[filled_groups[g]].mean() skill_mean_in_group = skill_vector[filled_groups[g]].mean() group_stats.append((age_mean_in_group, height_mean_in_group, weight_mean_in_group, skill_mean_in_group)) print('group-assignment solution means: ') for g in range(N_GROUPS): print(np.round(group_stats[g], 1)) """ Compare with input """ input_data = np.vstack((age_vector, height_vector, weight_vector, skill_vector)) print('input-means') print(age_mean, height_mean, weight_mean, skill_mean) print('input-data') print(input_data) 

Exit (time limit 2 minutes, commercial solver)

 Time limit reached Best objective 9.612058823514e-01, best bound 4.784117647059e-01, gap 50.2280% ('Min-objective: ', 0.961205882351435) ('Group: ', 0, ' of size: ', 3) [16, 20, 27] ('Group: ', 1, ' of size: ', 6) [26, 32, 34, 45, 47, 49] ('Group: ', 2, ' of size: ', 12) [0, 6, 10, 12, 15, 21, 24, 30, 38, 42, 43, 48] ('Group: ', 3, ' of size: ', 12) [2, 3, 13, 17, 19, 22, 23, 25, 31, 36, 37, 40] ('Group: ', 4, ' of size: ', 17) [1, 4, 5, 7, 8, 9, 11, 14, 18, 28, 29, 33, 35, 39, 41, 44, 46] group-assignment solution means: [ 33.3 179.3 83.7 49. ] [ 33.8 178.2 84.3 49.2] [ 33.9 178.7 83.8 49.1] [ 33.8 179.1 84.1 49.2] [ 34. 179.6 84.7 49. ] input-means (33.859999999999999, 179.06, 84.239999999999995, 49.100000000000001) input-data [[ 22. 35. 28. 32. 41. 26. 25. 37. 32. 26. 36. 36. 27. 34. 38. 38. 38. 47. 35. 35. 34. 30. 38. 34. 31. 21. 25. 28. 22. 40. 30. 18. 32. 46. 38. 38. 49. 20. 53. 32. 49. 44. 44. 42. 29. 39. 21. 36. 29. 33.] [ 161. 158. 177. 195. 197. 206. 169. 182. 182. 198. 165. 185. 171. 175. 176. 176. 172. 196. 186. 172. 184. 198. 172. 162. 171. 175. 178. 182. 163. 176. 192. 182. 187. 161. 158. 191. 182. 164. 178. 174. 197. 156. 176. 196. 170. 197. 192. 171. 191. 178.] [ 85. 103. 99. 93. 71. 109. 63. 87. 60. 94. 48. 122. 56. 84. 69. 162. 104. 71. 92. 97. 101. 66. 58. 69. 88. 69. 80. 46. 74. 61. 25. 74. 59. 69. 112. 82. 104. 62. 98. 84. 129. 71. 98. 107. 111. 117. 81. 74. 110. 64.] [ 81. 67. 49. 74. 65. 93. 25. 7. 99. 34. 37. 1. 25. 1. 96. 36. 39. 41. 33. 28. 17. 95. 11. 80. 27. 78. 97. 91. 77. 88. 29. 54. 16. 67. 26. 13. 31. 57. 84. 3. 87. 7. 99. 35. 12. 44. 71. 43. 16. 69.]] 

Decision Comments

  • This solution looks pretty good (relative to the average deviation), and it took him 2 minutes (we determined the time a-priori)
  • We also got a limited scope: 0.961 is our solution; we know it can't be lower than 4.784 0.961 is our solution; we know it can't be lower than 4.784

Reproducibility

  • The code uses numpy and cvxpy
  • Commercial solver was used
    • You may need to use a non-profit MIP solver (supporting a time limit for an early abortion, make the best decision)
    • Valid open source MIP solvers supported by cvxpy are cbc (there is no way to set time limits) and glpk (check documents for time-limited support)

Model Solutions

  • The code uses L1-normal punishment, which leads to a MIP problem
  • Depending on your problem, it may be wise to use a penalty for the L2 norm (one large deviation hurts more than many smaller ones), which will lead to a more complex problem ( MIQP / MISOCP )
0
source

All Articles