How to split a list into n groups in all possible combinations of length and group elements within a group?

I want to split the list into n groups in all possible combinations (taking into account the variable length of the group).

Let's say I have the following list:

lst=[1,2,3,4] 

If I specify n = 2, the list can be divided into groups of 1 element-3 or 2 element-2 elements. Within the two ways of separating the list, there are unique combinations of elements that are included in each list.

With n = 2, these will be:

 (1),(2,3,4) (2),(1,3,4) (3),(1,2,4) (4),(1,2,3) (1,2),(3,4) (1,3),(2,4) (1,4),(2,3) 

With n = 1 it will be:

 (1,2,3,4) 

And with n = 3 it will be:

 (1),(2),(3,4) (1),(3),(2,4) (1),(4),(2,3) (2),(3),(1,4) (2),(4),(1,3) (3),(4),(1,2) 

I'm not interested in groups of length 0, and the order inside the group doesn't matter.

I found two similar questions, but they definitely do not answer my question.

This question splits the list into all combinations, where each group has a length n (I found the answer by @tokland) to be especially useful). However, I am not looking for all groups to have the same length.

And then the first step of this question gets unique combinations of split places to break the list into n groups. However, the order of the list is preserved here, and unique combinations of elements within these groups are not defined.

I am looking for a combination of these two questions - the list is divided into n groups in all possible combinations of group lengths, as well as combinations of elements within the group.

+5
source share
2 answers

We can use the basic recursive algorithm from this answer and modify it to create partitions of a certain length without the need to generate and filter unwanted partitions.

 def sorted_k_partitions(seq, k): """Returns a list of all unique k-partitions of `seq`. Each partition is a list of parts, and each part is a tuple. The parts in each individual partition will be sorted in shortlex order (ie, by length first, then lexicographically). The overall list of partitions will then be sorted by the length of their first part, the length of their second part, ..., the length of their last part, and then lexicographically. """ n = len(seq) groups = [] # a list of lists, currently empty def generate_partitions(i): if i >= n: yield list(map(tuple, groups)) else: if n - i > k - len(groups): for group in groups: group.append(seq[i]) yield from generate_partitions(i + 1) group.pop() if len(groups) < k: groups.append([seq[i]]) yield from generate_partitions(i + 1) groups.pop() result = generate_partitions(0) # Sort the parts in each partition in shortlex order result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result] # Sort partitions by the length of each part, then lexicographically. result = sorted(result, key = lambda ps: (*map(len, ps), ps)) return result 

Quite a lot is going on here, so let me explain.

First, we start with a bottom-up procedural implementation (teminology?) Of the same recursive algorithm mentioned above :

 def partitions(seq): """-> a list of all unique partitions of `seq` in no particular order. Each partition is a list of parts, and each part is a tuple. """ n = len(seq) groups = [] # a list of lists, currently empty def generate_partitions(i): if i >= n: yield list(map(tuple, groups)) else: for group in groups group.append(seq[i]) yield from generate_partitions(i + 1) group.pop() groups.append([seq[i]]) yield from generate_partitions(i + 1) groups.pop() if n > 0: return list(generate_partitions(0)) else: return [[()]] 

The main algorithm is in the nested function generate_partitions . In principle, it looks through the sequence, and for each element it: 1) puts the element in each of the current groups (aka parts) in the working set and returns; 2) puts the element in its own new group.

When we get to the end of the sequence ( i == n ), we get a (deep) copy of the working set that we collected.

Now, to get partitions of a certain length, we could just filter or group the results for those we are looking for and do with them, but this approach does a lot of unnecessary work (i.e. recursive calls) if we just wanted partitions of some length k .

Please note that in the above function, the section length (i.e. the number of groups) increases whenever:

  # this adds a new group (or part) to the partition groups.append([seq[i]]) yield from generate_partitions(i + 1) groups.pop() 
Performed

.... Thus, we limit the size of the partition by simply placing protection on this block, for example:

 def partitions(seq, k): ... def generate_partitions(i): ... # only add a new group if the total number would not exceed k if len(groups) < k: groups.append([seq[i]]) yield from generate_partitions(i + 1) groups.pop() 

Adding a new parameter and only this line to the partitions function will now lead to the generation of only partitions up to k in length. This is almost what we want. The problem is that the for loop sometimes still generates partitions shorter than k .

To trim those recursive branches, we only need to execute the for loop, when we can be sure that we have enough remaining elements in our sequence to expand the working set of a total of groups k . The number of remaining elements - or elements that have not yet been placed in the group - n - i (or len(seq) - i ). And k - len(groups) is the number of new groups that need to be added to create a valid k-partition. If n - i <= k - len(groups) , we cannot discard the element by adding it to one of the current groups - we must create a new group.

So, we just add one more guard, this time to another recursive branch:

  def generate_partitions(i): ... # only add to current groups if the number of remaining items # exceeds the number of required new groups. if n - i > k - len(groups): for group in groups: group.append(seq[i]) yield from generate_partitions(i + 1) group.pop() # only add a new group if the total number would not exceed k if len(groups) < k: groups.append([seq[i]]) yield from generate_partitions(i + 1) groups.pop() 

And with that, you have a working k-partition generator. You could probably collapse some of the recursive calls even more (for example, if there are 3 remaining elements and we need 3 more groups, then you already know that you have to separate each element into your own group), but I wanted to show how small modification of the basic algorithm that generates all sections.

It remains only to sort the results. Unfortunately, instead of figuring out how to directly generate sections in the right order (an exercise for a smarter dog), I cheated and just sorted the post-generation.

 def sorted_k_partitions(seq, k): ... result = generate_partitions(0) # Sort the parts in each partition in shortlex order result = [sorted(ps, key = lambda p: (len(p), p)) for ps in result] # Sort partitions by the length of each part, then lexicographically. result = sorted(result, key = lambda ps: (*map(len, ps), ps)) return result 

A few self-evident, with the exception of key features. The first:

 key = lambda p: (len(p), p) 

says that it sorts the sequence by length, and then by the sequence itself (which by default in Python is ordered by lexicography). p means "part". This is used to sort parts / groups within a section. This key means that, for example, (4,) precedes (1, 2, 3) , so that [(1, 2, 3), (4,)] sorted as [(4,), (1, 2, 3)] .

 key = lambda ps: (*map(len, ps), ps) # or for Python versions <3.5: lambda ps: tuple(map(len, ps)) + (ps,) 

ps here means "parts", plural. This indicates that the sequence is sorted by the length of each of its elements (which should be the sequences themselves), then (lexicographically) by the sequence itself. This is used to sort the sections relative to each other, so, for example, [(4,), (1, 2, 3)] precedes [(1, 2), (3, 4)] .

Following:

 seq = [1, 2, 3, 4] for k in 1, 2, 3, 4: for groups in sorted_k_partitions(seq, k): print(k, groups) 

gives:

 1 [(1, 2, 3, 4)] 2 [(1,), (2, 3, 4)] 2 [(2,), (1, 3, 4)] 2 [(3,), (1, 2, 4)] 2 [(4,), (1, 2, 3)] 2 [(1, 2), (3, 4)] 2 [(1, 3), (2, 4)] 2 [(1, 4), (2, 3)] 3 [(1,), (2,), (3, 4)] 3 [(1,), (3,), (2, 4)] 3 [(1,), (4,), (2, 3)] 3 [(2,), (3,), (1, 4)] 3 [(2,), (4,), (1, 3)] 3 [(3,), (4,), (1, 2)] 4 [(1,), (2,), (3,), (4,)] 
+5
source

Following the helpful links from @friendly_dog, I am trying to answer my question by changing the functions used in this publication. I have a rude solution that works, although I am afraid that it is not particularly effective and might use some improvement. I end up creating many sets of partitions than I need, and then filter out those that differ only in sort order.

First I take these 3 functions from Install sections in Python :

 import itertools from copy import deepcopy def slice_by_lengths(lengths, the_list): for length in lengths: new = [] for i in range(length): new.append(the_list.pop(0)) yield new def partition(number): return {(x,) + y for x in range(1, number) for y in partition(number-x)} | {(number,)} def subgrups(my_list): partitions = partition(len(my_list)) permed = [] for each_partition in partitions: permed.append(set(itertools.permutations(each_partition, len(each_partition)))) for each_tuple in itertools.chain(*permed): yield list(slice_by_lengths(each_tuple, deepcopy(my_list))) 

Then I write a function that wraps the subgrups function and applies it to every permutation of my source list. I cycle through these subgroups and, if they are equal in length to the desired number of sections, sort them so that I can identify duplicates. I am not sure if there is a better approach to this.

 def return_partition(my_list,num_groups): filtered=[] for perm in itertools.permutations(my_list,len(my_list)): for sub_group_perm in subgrups(list(perm)): if len(sub_group_perm)==num_groups: #sort within each partition sort1=[sorted(i) for i in sub_group_perm] #sort by first element of each partition sort2=sorted(sort1, key=lambda t:t[0]) #sort by the number of elements in each partition sort3=sorted(sort2, key=lambda t:len(t)) #if this new sorted set of partitions has not been added, add it if sort3 not in filtered: filtered.append(sort3) return filtered 

Running it in my source list of examples, I see that it produces the desired output, it is checked on two sections and three sections.

 >>> for i in return_partition([1,2,3,4],2): ... print i ... [[1], [2, 3, 4]] [[4], [1, 2, 3]] [[1, 2], [3, 4]] [[3], [1, 2, 4]] [[1, 3], [2, 4]] [[2], [1, 3, 4]] [[1, 4], [2, 3]] >>> >>> for i in return_partition([1,2,3,4],3): ... print i ... [[1], [4], [2, 3]] [[3], [4], [1, 2]] [[1], [2], [3, 4]] [[1], [3], [2, 4]] [[2], [4], [1, 3]] [[2], [3], [1, 4]] >>> 
0
source

All Articles