Unique combinations in a list of k, v tuples in Python

I have a list of different combo elements in tuples

example = [(1,2), (2,1), (1,1), (1,1), (2,1), (2,3,1), (1,2,3)]

I want to group and count by unique combinations

gives the result

result = [((1,2), 3), ((1,1), 2), ((2,3,1), 2)]

It is not important not to maintain order or that the permutation of the combination is preserved, but it is very important that the operation is performed using the lambda function , and the output format is a list of tuples, as indicated above, because I will work with the spark RDD object

My code currently takes into account patterns taken from a dataset using

RDD = sc.parallelize(example) result = RDD.map(lambda(y):(y, 1))\ .reduceByKey(add)\ .collect() print result

I need another .map command that will add an account for different permutations as described above

+4
source share
6 answers

I solved my problem, but it was hard to understand what I was really looking for, I used

 example = [(1,2), (1,1,1), (1,1), (1,1), (2,1), (3,4), (2,3,1), (1,2,3)] RDD = sc.parallelize(example) result = RDD.map(lambda x: list(set(x)))\ .filter(lambda x: len(x)>1)\ .map(lambda(x):(tuple(x), 1))\ .reduceByKey(add)\ .collect() print result 

which also eliminated just duplicate values ​​like (1,1) and (1,1,1) which were useful to me

0
source

You can use OrderedDict to sort an ordered dictionary based on the sorted case of its elements:

 >>> from collections import OrderedDict >>> d=OrderedDict() >>> for i in example: ... d.setdefault(tuple(sorted(i)),i) ... ('a', 'b') ('a', 'a', 'a') ('a', 'a') ('a', 'b') ('c', 'd') ('b', 'c', 'a') ('b', 'c', 'a') >>> d OrderedDict([(('a', 'b'), ('a', 'b')), (('a', 'a', 'a'), ('a', 'a', 'a')), (('a', 'a'), ('a', 'a')), (('c', 'd'), ('c', 'd')), (('a', 'b', 'c'), ('b', 'c', 'a'))]) >>> d.values() [('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')] 
+1
source

How about this: maintain a set containing the sorted form of each element that you have already seen. Add an item to the list of results if you have not yet seen its sorted form.

 example = [ ('a','b'), ('a','a','a'), ('a','a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ] result = [] seen = set() for item in example: sorted_form = tuple(sorted(item)) if sorted_form not in seen: result.append(item) seen.add(sorted_form) print result 

Result:

 [('a', 'b'), ('a', 'a', 'a'), ('a', 'a'), ('c', 'd'), ('b', 'c', 'a')] 
+1
source

Since you are looking for a lambda function, try the following:

 lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values() 

You can use this lambda function as follows:

 uniquify = lambda x, y=OrderedDict(): [a for a in x if y.setdefault(tuple(sorted(a)), a) and False] or y.values() result = uniquify(example) 

Obviously, this sacrifices readability compared to other answers. This basically does the same thing as Kashramwa's answer, in one ugly line.

0
source

It looks like a sorted dict.

 from itertools import groupby ex = [(1,2,3), (3,2,1), (1,1), (2,1), (1,2), (3,2), (2,3,1)] f = lambda x: tuple(sorted(x)) as key [tuple(k) for k, _ in groupby(sorted(ex, key=f), key=f)] 

The best part is that you can get which tuples have the same combination:

 In [16]: example = [ ('a','b'), ('a','a','a'), ('a','a'), ('a', 'a', 'a', 'a'), ('b','a'), ('c', 'd'), ('b','c','a'), ('a','b','c') ] In [17]: for k, grpr in groupby(sorted(example, key=lambda x: tuple(sorted(x))), key=lambda x: tuple(sorted(x))): print k, list(grpr) ....: ('a', 'a') [('a', 'a')] ('a', 'a', 'a') [('a', 'a', 'a')] ('a', 'a', 'a', 'a') [('a', 'a', 'a', 'a')] ('a', 'b') [('a', 'b'), ('b', 'a')] ('a', 'b', 'c') [('b', 'c', 'a'), ('a', 'b', 'c')] ('c', 'd') [('c', 'd')] 
0
source

What you really need to use based on comments is map reduction. I don't have Spark, but according to the docs (see transformations ), it should be something like this:

 data.map(lambda i: (frozenset(i), i)).reduceByKey(lambda _, i : i) 

However, this will return (b, a) if your dataset has (a, b), (b, a) in that order.

0
source

All Articles