Sorting pairs of pairs by the frequency of paired elements

I am completely new to Python and, trying various random bits and pieces, I hit on a problem that I think I "solved", but the code doesn't feel good - I strongly suspect what is happening to be the best way to get the desired result .

FYI - I use any latest version of Python 3 on Windows.

Problem definition

In short, what I am doing is sorting a list of pairs, so pairs containing elements that appear in the smallest pairs are sorted in front.

These pairs are in the form [i,j] with 0 <= i <= j < n , where n is the known maximum value for the elements. There are no duplicate pairs in the list.

The number of elements i is a simple calculation of the number of pairs (not paired elements) in the forms [i,j] , [j,i] and [i,i] , where j is any value that leads to a real pair.

In the sorted result, the pair [i,j] should appear before the pair [k,l] if count(i) < count(k) or count(i) == count(k) and count(j) < count(l) (If count(j) == count(l) two can be in any order - I'm not worried that sorting is stable, there will be a bonus, though).

In the sorted result, the pair [i,j] should appear before the pair [k,l] if
min(count(i),count(j)) < min(count(k),count(l)) or
min(count(i),count(j)) == min(count(k),count(l)) and max(count(i),count(j)) < max(count(k),count(l)) .
In other words, if a pair [0,1] and 1 has a count of one, but 0 has a count of four hundred, the pair should still be (or at least very close) in front of the list - they need to be sorted by the least frequent elements paired with.

Here is a far-fetched example that I built:

 input [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]] 

Here, each element counts the pairs of sources from which they come:

 0: 1 [0,0] 1: 2 [1,2],[1,4] 2: 3 [1,2],[2,2],[2,3] 3: 3 [2,3],[3,3],[3,4] 4: 2 [1,4],[3,4] 

And here is the result, along with paired points:

 output: [[0,0],[1,4],[1,2],[3,4],[2,2],[2,3],[3,3]] scores: 1 1-2 1-3 2-3 3 3 3 

Here 0 has a counter (it appears in one pair, although twice), therefore, in the first place. 1 has a counter of two, so the second appears - from [1,4] to [1,2] , because 4 has a counter of two and 2 has a count of three, etc.

My current solution

As I said, I believe this exercise works for sure, but it just feels like there should be a better way to do this. Anyway, here's what I still have:

 #my implementation uncommented to reduce post size, see history for comments def sortPairList( data , n ): count = [] for i in range(0,n): count.append( 0 ) #count up the data for p in data: count[p[0]] += 1 if p[1] != p[0]: count[p[1]] += 1 maxcount = 0 for i in range(0,n): if count[i] > maxcount: maxcount = count[i] def elementFrequency(p): if count[ p[0] ] < count[ p[1] ]: return count[ p[0] ] + float(count[ p[1] ]) / (maxcount+1) else: return count[ p[1] ] + float(count[ p[0] ]) / (maxcount+1) data.sort( key=elementFrequency ) 

Any suggestions for a more "Python" way to do this?
Or is there something wrong with my current attempt?

New test case (see comments for answers)

 input: [[0,0],[0,3],[0,5],[0,7],[1,1],[1,2],[1,8],[2,4],[2,5],[3,4],[3,5],[3,9],[4,4],[4,7],[4,8],[6,8],[7,7],[7,9],[8,9]] expected: [[6,8],[1,1],[1,2],[2,5],[0,5],[1,8],[3,5],[3,9],[7,9],[8,9],[2,4],[0,0],[0,3],[0,7],[7,7],[3,4],[4,7],[4,8],[4,4]] 
+4
source share
4 answers

I would most likely use Counter (for this you need to use Python β‰₯2.7 or β‰₯3.1).

 from collections import Counter from itertools import chain def sortPairList2(data): tally = Counter(chain(*map(set, data))) data.sort(key=lambda x: sorted(tally[i] for i in x)) 

Note that:

  • You can create an anonymous function with lambda . For instance,

     >>> c = 4 >>> a = lambda p: p - c >>> a(7) 3 
  • The sort key does not have to be a number. All comparable value can be used as the return value of the key function. In my code, list used for ordering.

  • Python has many simpler idioms for your source code.

    • count can be initialized with count = [0] * n instead of this loop.
    • maxcount can be obtained using the max function . maxcount = max(count)
  • List descriptions are used a lot in Python. If your goal is to convert iterability to another iterable, prefer loop understanding.

+4
source
 >>> n = 4 >>> freqs = {i: sum(i in j for j in inp) for i in range(n+1)} >>> def key(x): a, b = x return min(freqs[a], freqs[b]), max(freqs[a], freqs[b]) >>> sorted(inp, key=key) 

PS Please note that input is a bad variable name, as it is shadow.

+1
source

While the KennyTM solution works, I tried to do it myself.

My solution pre-calculates the frequencies and stores it in a dictionary, where str(n) is the key. I had problems changing the comparison function known from Python2 to the key used with Python3, but I found a recipe for ActiveState code

 item_cnt = {} def icount(n): return item_cnt[str(n)] def add_item(n): sn = str(n) try: item_cnt[sn] += 1 except KeyError: item_cnt[sn] = 1 # sort callback def cmp_items(ij, kl): i, j = ij k, l = kl if icount(i) < icount(k) or icount(i) == icount(k) and icount(j) < icount(l): return -1 return 1 input = [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]] # count all items for (i, j) in input: add_item(i) add_item(j) # works with Python 2.x #input.sort(cmp_items) # works with Python2.6 and Python 3.x # to convert compare function to key look at: # http://code.activestate.com/recipes/576653-convert-a-cmp-function-to-a-key-function/ input.sort(key=cmp_to_key(cmp_items)) print(input) 
0
source

Similar to KennyTM solution, but for Python 2.5 or higher:

 import collections def sort_by_occurence(sequences): tally = collections.defaultdict(int) for sequence in sequences: for item in sequence: tally[item] += 1 sequences.sort(key=lambda x:map(tally.get, x)) pair_list = [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]] sort_by_occurence(pair_list) print pair_list 
0
source

Source: https://habr.com/ru/post/1316141/


All Articles