Sorting a complex Python dictionary with only one of its values

I am writing a small optimization tool for buying stamps at the post office.

In the process, I use a dictionary that I sort according to what I found out on this other “famous” question: Sort Python dictionary by value

In my case, my dictionary is slightly more complex:
- one four-position tuple to make a key
- and another set of five elements to make data .

The source of this dictionary is iteration, where each successful loop adds one line:

MyDicco[A, B, C, D] = eval, post, number, types, over 

This is just a tiny example of a trivial run using 75 cents:
{
(0, 0, 1, 1): ( 22 , 75, 2, 2, 0)
(0, 0, 0, 3): ( 31 , 75, 3, 1, 0)
(0, 0, 2, 0): ( 2521 , 100, 2, 1, 25)
(0, 1, 0, 0): ( 12511 , 200, 1, 1, 125)
(1, 0, 0, 0): ( 27511 , 350, 1, 1, 275)
}

So far, I'm using this code for sorting (works):

 MyDiccoSorted = sorted(MyDicco.items(), key=operator.itemgetter(1)) 

I sort by my score, because sorting is all about bringing the best solution to the top. Evaluation-evaluation is just one point from a tuple of five objects (in the example, these are evaluation-ratings: 22, 31, 2521, 12511 and 27511).

As you can see in the above example, it sorts (as I want) the second tuple, index 1. But I had to (angrily) bring my "grade point" to the foreground of my second tuple. Obviously, the code uses the entire second tuple for sorting process, which is heavy and not needed.


Here is my question: how can I sort more accurately. I do not want to sort through the entire second tuple of my dictionary: I want to fine-tune the first element.
And ideally, I would like to return this value to its original position, namely to be the last element in the second tuple - and still sort it.


I read and experimented with the operator.itemgetter () syntax, but I was not able to just “capture” the “first element of my second element”. https://docs.python.org/3/library/operator.html?highlight=operator.itemgetter#operator.itemgetter

(note: it is permissible to use tuples as keys and values, in accordance with:
https://docs.python.org/3/tutorial/datastructures.html?highlight=dictionary and they work fine for my project; this question is about better sorting)


For those who like a little background (you will yell at me that I should use some other method, but now I'm studying dictionaries (which is one of the goals of this project)):

This optimization is intended for developing countries, where often certain brand values ​​are absent or limited in stock at any post office. It will later work on Android phones.

We do regular mailings (yes, letters). Calculating the exact postage for each destination with available values ​​and finding solutions with low stocks of certain values ​​is not a trivial process if you count six different mail items based on the destination and hundreds of letters to be sent by mail.

There are other modules that help turn a theoretical optimal solution into something that you can really buy any day with a strategic dialogue guide ...

About my vocabulary in this matter: I sort through all the reasonable combinations (high enough to make the necessary postage and only overpayment up to a fraction of one mark) of stamped values.

Then I calculate the “success” value based on the number of required brands (priority), the number of required types (lower priority) (since buying different brands takes extra time on the counter) and a very high penalty for paying. Thus, the lowest value means the highest success.

I collect all reasonable "solutions" in the dictionary, where the tuple of the necessary stamps serves as the key, and another set of some results - the data are the values. It is gently redefined, because a person needs to read it at this stage of the project (for debugging).

If you are interested and want to read an example (first line):
Columns:

  • 350 cents stamps
  • number of stamps 200 cents
  • number of stamps at 50 cents
  • number of stamps at 25 cents
  • assessment-assessment
  • estimated applicable postage
  • total number of applied brands
  • total number of stamp types
  • overpayment in cents, if any

Or in words: (Assuming that the postal service offers existing stamps of 350, 200, 50 and 25 cents), I can apply postage of 75 cents using 1 x 50 cents and 1 x 25 cents, This gives me a success rating of 22 (the best on this list), postage is 75 cents, requiring two stamps of two different values ​​and having overfulfillment of 0 cents.

+6
source share
7 answers

You can just use a double index, something like this should work:

 MyDiccoSorted = sorted(MyDicco.items(), key=lambda s: s[1][2]) 

Just set 2 to the fact that the index has an identifier in the tuple.

+4
source

I find it easier to use lambda expressions than remembering various operator functions.

Assuming at the moment that your eval score is the third element of your tuple of values ​​(i.e. (post, number, eval, types, over ):

 MyDiccoSorted = sorted(MyDicco.items(), key=lamba x:x[1][2]) 

Alternatively, you can create a named function to do the job:

 def myKey(x): return x[1][2] MyDiccoSorted = sorted(MyDicco.items(), key=myKey) 
+4
source

You can use the lambda expression instead of operator.itemgetter() to get the exact element to sort. Assuming your eval is the first element in the values tuple, otherwise use the index of the exact element you want in x[1][0] . Example -

 MyDiccoSorted = sorted(MyDicco.items(), key=lambda x: x[1][0]) 

How it works -

A dict.items() returns something similar to a list of tuples (although this is not quite what it was in Python 3.x), Example -

 >>> d = {1:2,3:4} >>> d.items() dict_items([(1, 2), (3, 4)]) 

Now, in sorted() , the key argument takes a function object (which can be lambda or operator.itemgetter() , which also returns a function or any simple function), the function you pass to key > must take one argument, which will be an element sortable list.

Then the key function is called with each element, and you must return the correct value to sort the list. An example to help you understand this is

 >>> def foo(x): ... print('x =',x) ... return x[1] ... >>> sorted(d.items(),key=foo) x = (1, 2) x = (3, 4) [(1, 2), (3, 4)] 
+3
source

Does this do what you need?

 sorted(MyDicco.items(), key=lambda x: x[1][0]) 
+1
source
 index_of_evaluation_score = 0 MyDiccoSorted = sorted(MyDicco.items(), key=lambda key_value: key_value[1][index_of_evaluation_score]) 
+1
source

By placing the rating point at the end where you wanted it, you can use the following:

 MyDicco = { (0, 0, 1, 1): (75, 2, 2, 0, 22), (0, 0, 0, 3): (75, 3, 1, 0, 31), (0, 0, 2, 0): (100, 2, 1, 25, 2521), (0, 1, 0, 0): (200, 1, 1, 125, 12511), (1, 0, 0, 0): (350, 1, 1, 275, 27511)} MyDiccoSorted = sorted(MyDicco.items(), key=lambda x: x[1][4]) print MyDiccoSorted 

Donation:

 [((0, 0, 1, 1), (75, 2, 2, 0, 22)), ((0, 0, 0, 3), (75, 3, 1, 0, 31)), ((0, 0, 2, 0), (100, 2, 1, 25, 2521)), ((0, 1, 0, 0), (200, 1, 1, 125, 12511)), ((1, 0, 0, 0), (350, 1, 1, 275, 27511))] 
+1
source

I think one of the things you can look for is stable sorting.

Sort functions in Python are usually “stable” sorts. For example, if you sort:

  1 4 6 2 8 1 1 2 3 2 1 8 

on its first column you will receive:

  1 4 6 1 2 3 2 8 1 2 1 8 

The order of the rows that have the same value in column 1 does not change. 1 4 6 sorted to 1 2 3 because it was the original order of these rows before sorting column 1. Sorting has been "stable" since Python version 2.2. More details here .

On the other hand, I am interested in how much you should have explained your code. This is a sign that the code will benefit from refactoring to make its purpose more understandable.

Named tuples can be used to remove the hard-to-read tuple indices that you see in many answers here, for example, key=lambda x: x[1][0] - what does this really mean? What does it do?

Here's a version using named tuples that helps readers (most importantly, you!) To understand what your code is trying to do. Note that lambda now explains itself much better.

 from collections import namedtuple StampMix = namedtuple('StampMix', ['c350', 'c200', 'c50', 'c25']) Stats = namedtuple('Stats', ['score', 'postage', 'stamps', 'types', 'overpayment']) data = { (0, 0, 1, 1): (22, 75, 2, 2, 0), (0, 0, 0, 3): (31, 75, 3, 1, 0), (0, 0, 2, 0): (2521, 100, 2, 1, 25), (0, 1, 0, 0): (12511, 200, 1, 1, 125), (1, 0, 0, 0): (27511, 350, 1, 1, 275) } candidates = {} for stampmix, stats in data.items(): candidates[StampMix(*stampmix)] = Stats(*stats) print(sorted(candidates.items(), key=lambda candidate: candidate[1].score)) 

You can see the benefits of this approach in the output:

 >>> python namedtuple.py (prettied-up output follows...) [ (StampMix(c350=0, c200=0, c50=1, c25=1), Stats(score=22, postage=75, stamps=2, types=2, overpayment=0)), (StampMix(c350=0, c200=0, c50=0, c25=3), Stats(score=31, postage=75, stamps=3, types=1, overpayment=0)), (StampMix(c350=0, c200=0, c50=2, c25=0), Stats(score=2521, postage=100, stamps=2, types=1, overpayment=25)), (StampMix(c350=0, c200=1, c50=0, c25=0), Stats(score=12511, postage=200, stamps=1, types=1, overpayment=125)), (StampMix(c350=1, c200=0, c50=0, c25=0), Stats(score=27511, postage=350, stamps=1, types=1, overpayment=275)) ] 

and this will help with your algorithms too. For instance:

 def score(stats): return stats.postage * stats.stamps * stats.types + 1000 * stats.overpayment 
+1
source

All Articles