Reversible dictionary for python

I would like to save some data in Python in a similar form with a dictionary: {1:'a', 2:'b'} . Each value will be unique, not only among other values, but also among keys.

Is there a simple data structure that I can use to get the corresponding object, regardless of whether I ask using the "key" or "value"? For example:

 >>> a = {1:'a', 2:'b'} >>> a[1] 'a' >>> a['b'] 2 >>> a[3] KeyError 

"Keys" are standard python collections; values ​​are short (<256char) strings.

My current solution is to create a reverse dictionary and search for it if I cannot find the result in the original dictionary:

 pointsreversed = dict((v, k) for k, v in points.iteritems()) def lookup(key): return points.get(key) or pointsreversed.key() 

This uses twice as much space, which is not very convenient (my dictionaries can be up to several hundred megabytes) and an average of 50% slower.

EDIT: as mentioned in several answers, the two dicts do not use dual memory usage, as this is only a dictionary, not elements inside, i.e. duplication.

Is there a solution to improve this?

+6
python dictionary hashtable
source share
6 answers

Related posts:

Python Inverse Transformation

Python mapping 1: 1

Of course, if all values ​​and keys are unique, could you just use one dictionary and insert as key: value and value: key initially?

+8
source share

If your keys and values ​​do not overlap, one obvious approach is to simply store them in the same dict. i.e:

 class BidirectionalDict(dict): def __setitem__(self, key, val): dict.__setitem__(self, key, val) dict.__setitem__(self, val, key) def __delitem__(self, key): dict.__delitem__(self, self[key]) dict.__delitem__(self, key) d = BidirectionalDict() d['foo'] = 4 print d[4] # Prints 'foo' 

(You might also want to implement things like the __init__ , update and iter* methods to act like a real dict, depending on how much functionality you need).

This should include only one search, although it may not save you a lot of memory (you still have twice as many dict entries). Please note, however, that neither this original nor your original will use twice as much space: the recorder takes up only space for links (effectively pointers), as well as overhead costs for general employment. The space occupied by your data will not be repeated twice, since the same objects point to it.

+10
source share

In the art of computer programming, Vokume 3 Knuth has a section on finding secondary keys. For the purposes of your question, value can be considered a secondary key.

The first suggestion is to do what you did: make an effective key index by value.

The second suggestion is to install a large btree, which is a composite index of clustered data, where the branch nodes contain values ​​and the leaves contain key data and pointers to a large record (if any).

If the data is geometric (as it seems, it seems), there are things called mail trees. It can answer questions such as what is the closest object to the point x. A few examples are given here: http://simsearch.yury.name/russir/01nncourse-hand.pdf Another simple option for this type of query is quadtree and the kd tree. http://en.wikipedia.org/wiki/Quadtree

Another final option is combinatorial hashing, in which you combine the key and value into a special type of hash that allows you to efficiently search for the hash, even if you do not have both values. I could not find a good combinatorial hash explanation online, but it is in TAoCP, Volume 3, Second Edition on page 573.

Of course, for some of them, you may have to write your own code. But if memory or performance is really key, you can waste time.

+3
source share

Do not use "double space". Dictionaries simply store links to data, not the data itself. So, if you have a million lines in a billion bytes, then each dictionary can be an extra 10-20 million bytes β€” a tiny fraction of the total storage. Using two dictionaries is the right thing.

+1
source share

Insert the return pair (key, value) in the same dict:

 a = {1:'a', 2:'b'} a.update(dict((v, k) for k, v in a.iteritems())) 

Then you can do both as you need:

 print a[1] print a['a'] 
0
source share

Here's another solution using a user-defined class.

And the code ...

 # search a dictionary for key or value # using named functions or a class # tested with Python25 by Ene Uran 01/19/2008 def find_key(dic, val): """return the key of dictionary dic given the value""" return [k for k, v in symbol_dic.iteritems() if v == val][0] def find_value(dic, key): """return the value of dictionary dic given the key""" return dic[key] class Lookup(dict): """ a dictionary which can lookup value by key, or keys by value """ def __init__(self, items=[]): """items can be a list of pair_lists or a dictionary""" dict.__init__(self, items) def get_key(self, value): """find the key(s) as a list given a value""" return [item[0] for item in self.items() if item[1] == value] def get_value(self, key): """find the value given a key""" return self[key] 
0
source share

All Articles