Sorting pairs of pairs by the frequency of paired elements

Question

Sorting pairs of pairs by the frequency of paired elements

I am completely new to Python and, trying various random bits and pieces, I hit on a problem that I think I "solved", but the code doesn't feel good - I strongly suspect what is happening to be the best way to get the desired result .

FYI - I use any latest version of Python 3 on Windows.

Problem definition

In short, what I am doing is sorting a list of pairs, so pairs containing elements that appear in the smallest pairs are sorted in front.

These pairs are in the form [i,j] with 0 <= i <= j < n , where n is the known maximum value for the elements. There are no duplicate pairs in the list.

The number of elements i is a simple calculation of the number of pairs (not paired elements) in the forms [i,j] , [j,i] and [i,i] , where j is any value that leads to a real pair.

In the sorted result, the pair [i,j] should appear before the pair [k,l] if count(i) < count(k) or count(i) == count(k) and count(j) < count(l) (If count(j) == count(l) two can be in any order - I'm not worried that sorting is stable, there will be a bonus, though).

In the sorted result, the pair [i,j] should appear before the pair [k,l] if
min(count(i),count(j)) < min(count(k),count(l)) or
min(count(i),count(j)) == min(count(k),count(l)) and max(count(i),count(j)) < max(count(k),count(l)) .
In other words, if a pair [0,1] and 1 has a count of one, but 0 has a count of four hundred, the pair should still be (or at least very close) in front of the list - they need to be sorted by the least frequent elements paired with.

Here is a far-fetched example that I built:

 input [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]]

Here, each element counts the pairs of sources from which they come:

 0: 1 [0,0] 1: 2 [1,2],[1,4] 2: 3 [1,2],[2,2],[2,3] 3: 3 [2,3],[3,3],[3,4] 4: 2 [1,4],[3,4]

And here is the result, along with paired points:

 output: [[0,0],[1,4],[1,2],[3,4],[2,2],[2,3],[3,3]] scores: 1 1-2 1-3 2-3 3 3 3

Here 0 has a counter (it appears in one pair, although twice), therefore, in the first place. 1 has a counter of two, so the second appears - from [1,4] to [1,2] , because 4 has a counter of two and 2 has a count of three, etc.

My current solution

As I said, I believe this exercise works for sure, but it just feels like there should be a better way to do this. Anyway, here's what I still have:

 #my implementation uncommented to reduce post size, see history for comments def sortPairList( data , n ): count = [] for i in range(0,n): count.append( 0 ) #count up the data for p in data: count[p[0]] += 1 if p[1] != p[0]: count[p[1]] += 1 maxcount = 0 for i in range(0,n): if count[i] > maxcount: maxcount = count[i] def elementFrequency(p): if count[ p[0] ] < count[ p[1] ]: return count[ p[0] ] + float(count[ p[1] ]) / (maxcount+1) else: return count[ p[1] ] + float(count[ p[0] ]) / (maxcount+1) data.sort( key=elementFrequency )

Any suggestions for a more "Python" way to do this?
Or is there something wrong with my current attempt?

New test case (see comments for answers)

 input: [[0,0],[0,3],[0,5],[0,7],[1,1],[1,2],[1,8],[2,4],[2,5],[3,4],[3,5],[3,9],[4,4],[4,7],[4,8],[6,8],[7,7],[7,9],[8,9]] expected: [[6,8],[1,1],[1,2],[2,5],[0,5],[1,8],[3,5],[3,9],[7,9],[8,9],[2,4],[0,0],[0,3],[0,7],[7,7],[3,4],[4,7],[4,8],[4,4]]

+4

python sorting

DMA57361 Jul 19 '10 at 10:08

source share

4 answers

 >>> n = 4 >>> freqs = {i: sum(i in j for j in inp) for i in range(n+1)} >>> def key(x): a, b = x return min(freqs[a], freqs[b]), max(freqs[a], freqs[b]) >>> sorted(inp, key=key)

PS Please note that input is a bad variable name, as it is shadow.

+1

Silentghost Jul 19 '10 at 11:05

source share

While the KennyTM solution works, I tried to do it myself.

My solution pre-calculates the frequencies and stores it in a dictionary, where str(n) is the key. I had problems changing the comparison function known from Python2 to the key used with Python3, but I found a recipe for ActiveState code

 item_cnt = {} def icount(n): return item_cnt[str(n)] def add_item(n): sn = str(n) try: item_cnt[sn] += 1 except KeyError: item_cnt[sn] = 1 # sort callback def cmp_items(ij, kl): i, j = ij k, l = kl if icount(i) < icount(k) or icount(i) == icount(k) and icount(j) < icount(l): return -1 return 1 input = [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]] # count all items for (i, j) in input: add_item(i) add_item(j) # works with Python 2.x #input.sort(cmp_items) # works with Python2.6 and Python 3.x # to convert compare function to key look at: # http://code.activestate.com/recipes/576653-convert-a-cmp-function-to-a-key-function/ input.sort(key=cmp_to_key(cmp_items)) print(input)

0

Michał Niklas Jul 19 '10 at 11:08

source share

Similar to KennyTM solution, but for Python 2.5 or higher:

 import collections def sort_by_occurence(sequences): tally = collections.defaultdict(int) for sequence in sequences: for item in sequence: tally[item] += 1 sequences.sort(key=lambda x:map(tally.get, x)) pair_list = [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]] sort_by_occurence(pair_list) print pair_list

0

pillmuncher Jul 19 '10 at 18:54

source share

kennytm · Accepted Answer · 2010-07-19T10:27:19+0000

I would most likely use Counter (for this you need to use Python ≥2.7 or ≥3.1).

 from collections import Counter from itertools import chain def sortPairList2(data): tally = Counter(chain(*map(set, data))) data.sort(key=lambda x: sorted(tally[i] for i in x))

Note that:

You can create an anonymous function with lambda . For instance,
```
 >>> c = 4 >>> a = lambda p: p - c >>> a(7) 3 
```
The sort key does not have to be a number. All comparable value can be used as the return value of the key function. In my code, list used for ordering.
Python has many simpler idioms for your source code.
- count can be initialized with count = [0] * n instead of this loop.
- maxcount can be obtained using the max function . maxcount = max(count)
List descriptions are used a lot in Python. If your goal is to convert iterability to another iterable, prefer loop understanding.

Sorting pairs of pairs by the frequency of paired elements

Problem definition

My current solution

New test case (see comments for answers)

More articles: