Python3 style sorting - a function of the old cmp method in the new key mechanism?

I read about a wrapper function to move the cmp style comparison to the key style comparison in Python 3, where the cmp feature was removed.

I have a hell of a time wrapping my head around how the sorted () function with the Python3 direct key, with at least as far as I understand only one element specified for the key, can allow you to correctly compare, for example, two IPs for order. Or the boor calls.

While with cmp he had nothing: sorted () and sort () called you with two ips, you looked at the relevant parts, made your decisions, made.

def ipCompare(dqA,dqB): ... ipList = sorted(ipList,cmp=ipCompare) 

Same thing with ham radio. Sorting is not alphabetical; calls usually mean letter + number + letter (s); the first sort priority is the number number, then the first letter (s), then the last letter (s.)

Using cmp ... no sweat.

 def callCompare(callA,callB): ... hamlist = sorted(hamlist,cmp=callCompare) 

With Python3 ... without going through the wrapper wrap wrapper ... and passing in one element ... I think ... how can this be done?

And if a shell is absolutely necessary ... then why remove cmp inside Python3 first?

I'm sure I missed something. I just can't see it.: /


OK, now I know what is missing. IPs solutions are provided in the answers below. Here is the key that I came up with to sort the shabby calls of a common prefix, region, postfix:

 import re def callKey(txtCall): l = re.split('([0-9]*)',txtCall.upper(),1) return l[1],l[0],l[2] hamList = ['N4EJI','W1AW','AA7AS','na1a'] sortedHamList = sorted(hamList,key=callKey) 

sortedHamList result ['na1a','W1AW','N4EJI','AA7AS']

More details:

  • AA7AS leaves callKey() as 7,AA,AS
  • N4EJI exits callKey() as 4,N,EJI
  • W1AW exits callKey() as 1,W,AW
  • na1a leaves callKey() as 1,NA,A
+1
source share
3 answers

First, if you have not read the Sorting HOWTO , be sure to read this; this explains a lot that may not be reasonable at first.


For your first example, two IPv4 addresses, the answer is pretty simple.

To compare two addresses, you need to do one of the obvious things: convert them both from four-dot strings and from four strings, and then simply compare the tuples:

 def cmp_ip(ip1, ip2): ip1 = map(int, ip1.split('.')) ip2 = map(int, ip2.split('.')) return cmp(ip1, ip2) 

Better still is to convert them to some kind of object that represents an IP address and has comparison operators. In 3.4+, stdlib has such an object inline; let it look 2.7 too:

 def cmp_ip(ip1, ip2): return cmp(ipaddress.ip_address(ip1), ipaddress.ip_address(ip2)) 

Obviously, this is even simpler than key functions:

 def key_ip(ip): return map(int, ip.split('.')) def key_ip(ip): return ipaddress.ip_address(ip) 

For your second example of the mention of ham radio operators: in order to write the cmp function, you should be able to split each ham address into letters, numbers, letters, then compare the numbers, then compare the first letters, then compare the two letters. To write the key function, you must be able to split the ham address into letters, numbers, letters, and then return a tuple (numbers, first letters, letters). Again, the key function is actually simpler, not harder.


And indeed, this applies to most of the examples that anyone could come up with. The most complex comparisons ultimately boil down to a complex transformation into a sequence of parts, and then to a simple lexicographic comparison of this sequence.

This is why cmp functions are deprecated in 2.4 and permanently removed in 3.0.


Of course, there are some cases where the cmp function is easier to read - most of the examples that people try to come up with are erroneous, but there are some. And there is also code that has been running for 20 years, and no one wants to rethink it under new conditions without any benefit. For these cases, you have cmp_to_key .


In fact, another reason for cmp was obsolete, on top of this and possibly a third.

In Python 2.3, types had the __cmp__ method, which was used to process all statements. In 2.4, they replaced the six methods __lt__ , __eq__ , etc. This allows you to increase flexibility, for example, you can have types that are not fully ordered. So, 2,3 when comparing a < b , it actually did a.__cmp__(b) < 0 , which pretty obviously displays the cmp argument. But in version 2.4+ a < b does a.__lt__(b) , which is not. This has confused many people over the years and removed both the __cmp__ and cmp arguments to sort functions removed from this confusion.

Meanwhile, if you read HOWTO Sort, you will notice that before we had cmp , the only way to do this is to decorate-sort-undecorate (DSU). Note that the blindly explicit idea of ​​a good key function for good DSU sorting and vice versa, but this is definitely not obvious with the cmp function. I don’t remember anyone explicitly mentioning this on the py3k list, but I suspect that people may have had this in their heads when deciding whether to kill cmp permanently.

+5
source

To use the new key argument , simply decompose the comparison into another object that already implements a well-ordered comparison, such as a tuple or list (for example, a sequence of integers). These types work well because they are ordered in order.

 def ip_as_components (ip): return map(int, ip.split('.')) sorted_ips = sorted(ips, key=ip_as_components) 

The order of each component coincides with the individual tests, which can be found in the traditional comparison function and then the comparison.

Looking at the HAM order, it might look like this:

 def ham_to_components (ham_code): # .. decompose components based on ordering of each return (prefix_letters, numbers, postfix_letters) 

The key approach (similar to the "order by" found in other languages) is usually a simpler and more natural construct related to - assuming that the original types are no longer ordered. The main drawback of this approach is that the partially reverse (e.g. asc then desc) ordering can be complicated, but it is solvable by returning nested tuples, etc.

In Py3.0, the cmp parameter has been completely removed (as part of a larger effort to simplify and unify the language, eliminating the conflict between rich comparisons and the __cmp__() magic method).

If you absolutely need to sorted with a custom "cmp", cmp_to_key can be used trivially.

 sorted_ips = sorted(ips, key=functools.cmp_to_key(ip_compare)) 
+4
source

According to official docs - https://docs.python.org/3/howto/sorting.html#the-old-way-using-the-cmp-parameter

When porting code from Python 2.x to 3.x, a situation may arise if you have a user who provides a comparison function and you need to convert it to a key function. The following shell makes this easy:

 def cmp_to_key(mycmp): 'Convert a cmp= function into a key= function' class K: def __init__(self, obj, *args): self.obj = obj def __lt__(self, other): return mycmp(self.obj, other.obj) < 0 def __gt__(self, other): return mycmp(self.obj, other.obj) > 0 def __eq__(self, other): return mycmp(self.obj, other.obj) == 0 def __le__(self, other): return mycmp(self.obj, other.obj) <= 0 def __ge__(self, other): return mycmp(self.obj, other.obj) >= 0 def __ne__(self, other): return mycmp(self.obj, other.obj) != 0 return K 

To convert to a key function, simply wrap the old comparison function:

 >>> def reverse_numeric(x, y): ... return y - x >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) [5, 4, 3, 2, 1] 
+1
source

All Articles