Ranking items from multiple lists by their score in Python

I want to rank several lists according to their elements, how often they appear in each list. Example:

list1 = 1,2,3,4
list2 = 4,5,6,7
list3 = 4,1,8,9

result = 4,1,2,3,4,5,6,7,8 (4 counts three times, 1 two times, and the rest - once)

I tried the following, but I need something more intelligent and something that I can do with any number of lists.

l = [] l.append([ 1, 2, 3, 4, 5]) l.append([ 1, 9, 3, 4, 5]) l.append([ 1, 10, 8, 4, 5]) l.append([ 1, 12, 13, 7, 5]) l.append([ 1, 14, 13, 13, 6]) x1 = set(l[0]) & set(l[1]) & set(l[2]) & set(l[3]) x2 = set(l[0]) & set(l[1]) & set(l[2]) & set(l[4]) x3 = set(l[0]) & set(l[1]) & set(l[3]) & set(l[4]) x4 = set(l[0]) & set(l[2]) & set(l[3]) & set(l[4]) x5 = set(l[1]) & set(l[2]) & set(l[3]) & set(l[4]) set1 = set(x1) | set(x2) | set(x3) | set(x4) | set(x5) a1 = list(set(l[0]) & set(l[1]) & set(l[2]) & set(l[3]) & set(l[4])) a2 = getDifference(list(set1),a1) print a1 print a2 

Now here is the problem ... I can do it again and again with a3, a4 and a5, but it's too complicated, I need a function for this ... But I don’t know how ... my math is stuck;)

SOLVED: thanks for the discussion. As a beginner, I somehow like this system: fast + informative. You helped me! Ty

+2
python list set rank ranking
source share
5 answers
 import collections data = [ [1, 2, 3, 4, 5], [1, 9, 3, 4, 5], [1, 10, 8, 4, 5], [1, 12, 13, 7, 5], [1, 14, 13, 13, 6], ] def sorted_by_count(lists): counts = collections.defaultdict(int) for L in lists: for n in L: counts[n] += 1 return [num for num, count in sorted(counts.items(), key=lambda k_v: (k_v[1], k_v[0]), reverse=True)] print sorted_by_count(data) 

Now let me generalize it (to fulfill any iterative, cancel the hash requirement), allow key and inverse parameters (to match sorted ones) and rename to freq_sorted :

 def freq_sorted(iterable, key=None, reverse=False, include_freq=False): """Return a list of items from iterable sorted by frequency. If include_freq, (item, freq) is returned instead of item. key(item) must be hashable, but items need not be. *Higher* frequencies are returned first. Within the same frequency group, items are ordered according to key(item). """ if key is None: key = lambda x: x key_counts = collections.defaultdict(int) items = {} for n in iterable: k = key(n) key_counts[k] += 1 items.setdefault(k, n) if include_freq: def get_item(k, c): return items[k], c else: def get_item(k, c): return items[k] return [get_item(k, c) for k, c in sorted(key_counts.items(), key=lambda kc: (-kc[1], kc[0]), reverse=reverse)] 

Example:

 >>> import itertools >>> print freq_sorted(itertools.chain.from_iterable(data)) [1, 5, 4, 13, 3, 2, 6, 7, 8, 9, 10, 12, 14] >>> print freq_sorted(itertools.chain.from_iterable(data), include_freq=True) # (slightly reformatted) [(1, 5), (5, 4), (4, 3), (13, 3), (3, 2), (2, 1), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (12, 1), (14, 1)] 
+6
source share

Combining a couple of ideas already published:

 from itertools import chain from collections import defaultdict def frequency(*lists): counter = defaultdict(int) for x in chain(*lists): counter[x] += 1 return [key for (key, value) in sorted(counter.items(), key=lambda kv: (kv[1], kv[0]), reverse=True)] 

Notes:

  • In Python 2.7, you can use Counter instead of defaultdict(int) .
  • This version takes any number of lists as an argument; A leading asterisk means that they will all be packaged in a tuple. If you want to submit a single list containing all of your lists, omit this leading asterisk.
  • This aborts if your lists contain a type that is not a type.
+2
source share
 def items_ordered_by_frequency(*lists): # get a flat list with all the values biglist = [] for x in lists: biglist += x # sort it in reverse order by frequency return sorted(set(biglist), key=lambda x: biglist.count(x), reverse=True) 
+1
source share

Try the following:

 def rank(*lists): d = dict() for lst in lists: for e in lst: if e in d: d[e] += 1 else: d[e] = 1 return [j[1] for j in sorted([(d[i],i) for i in d], reverse=True)] 

Usage example:

 a = [1,2,3,4] b = [4,5,6,7] c = [4,1,8,9] print rank(a,b,c) 

You can use any number of lists as input

0
source share

You can count the number of occurrences of each element (histogram), and then sort by it:

 def histogram(enumerable): result = {} for x in enumerable: result.setdefault(x, 0) result[x] += 1 return result lists = [ [1,2,3,4], [4,5,6,7], ... ] from itertools import chain h = histogram(chain(*lists)) ranked = sorted(set(chain(*lists)), key = lambda x : h[x], reverse = True) 
0
source share

All Articles