Mapping std :: map in Python

Sometimes it makes sense to have a dictionary with a key. In C ++, this is often implemented with a red-black tree. But any self-balancing binary search tree will do (fwiw, Knuth is especially clear on this). The best solution I've been able to come up with so far is to adopt R. McGraw AVL-tree type and create a wrapper class that basically implements the STL map interface (also relying on convenient ordering of pairs (two tuple elements) in Python). Such a tuple basically corresponds to std :: map :: value_type.

Yes, there is a Python bisect module, and although it is logarithmic during insertion just as self-balancing binary trees are logarithmic during insertion (right?), To be honest, I just want an object. OrderedDict or something like that is called (or not, Python 3.1 OrderedDict is not suitable - to order insert-time, and, frankly, that ordering insert-time is relevant to the order is not entirely obvious).

Note. A key-ordered dictionary is very useful in many industries (for example, in finance, usually for tracking prices for data books, which are mainly ordered by price dictionaries β†’ quantity, aggregated order information, etc.).

If anyone has any other ideas, that's great. All I know is that I just got five million times smarter answers from Alex Martelli. So I thought I would ask.

+6
python dictionary
source share
7 answers

As you said, you can do your own implementation with bisect:

class OrderedDict: def __init__(self, keyvalues_iter): self.__srtlst__ = sorted(keyvalues_iter) def __len__(self): return len(self.__srtlst__) def __contains__(self, key): index = bisect(self.__srtlst__, key) if index<len(self.__srtlst__) and self.__srtlst__[index][0] == key: return True else: return False def __getitem__(self, key): index = bisect(self.__srtlst__, key) if index<len(self.__srtlst__) and self.__srtlst__[index][0] == key: return self.__srtlst__[index][1] else: raise KeyError(key) def __setitem__(sekf, key, value): index = bisect(self.__srtlst__, key) if index<len(self.__srtlst__) and self.__srtlst__[index][0] == key: self.__srtlst__[index][1] = value else: self.__srtlst__[index]=(key, value) def __delitem__(sekf, key, value): index = bisect(self.__srtlst__, key) if index<len(self.__srtlst__) and self.__srtlst__[index][0] == key: del __srtlst__[index] else: raise KeyError(key) def __iter__(self): return (v for k,v in self.__srtlst__) def clear(self): self.__srtlst__ = [] def get(self, key, default=None): index = bisect(self.__srtlst__, key) if index<len(self.__srtlst__) and self.__srtlst__[index][0] == key: return self.__srtlst__[index][1] else: return default def items(self): return self.__srtlst__[:] def iteritems(self): return iter(self.__srtlst__) def iterkeys(self): return (k for k,v in self.__srtlst__) def itervalues(self): return (v for k,v in self.__srtlst__) def keys(self): return [k for k,v in self.__srtlst__] def values(self): return [v for k,v in self.__srtlst__] def setdefault(self, key, default): index = bisect(self.__srtlst__, key) if index<len(self.__srtlst__) and self.__srtlst__[index][0] == key: return self.__srtlst__[index][1] else: self.__srtlst__[index]=(key, default) return default def update(self, other): #a more efficient implementation could be done merging the sorted lists for k, v in other.iteritems(): self[k] = v 

hmmm ... it seems that I already did this for you, d'oh!

-one
source share

I have exactly the same need, and Alex Martelli's answer completely convinced me: it is best to store the dictionary and the list of partially sorted keys, and then sort when necessary. This is effective due to the very specific behavior of the python sorting algorithm (AKA Timsort). Key-ordered dict in Python

I tested its implementation and mine, and it was better (because it does not insert in the middle of the list)

(I highly recommend you read the article related to AM's comment on timsort, which is a gem).

+2
source share

Lists are a miserable replacement for wood.

Inserts should move the entire list around to make room; Removal is necessary to move the list back. Adding or removing material in batch mode is great when possible, but it very often does not happen, or takes unnatural distortions to organize it. The fundamental attribute of the tree is that the insertions and deletions are O (log n); no number of hands will turn O (n) into O (log n).

Inserting an element into a tree when you already know where it will go is O (1). Equivalently, removing an element from a tree based on its node is also equal to O (1). std :: map supports both of these options. This is O (n) with a list.

Another fundamental property of the tree is that iteration over a range of values ​​is O (1) for each iteration. Combining a list and a dict loses this because each iteration must perform a type search. (The tuple list routine does not have this problem.)

Trees are one of the most basic data types. The disadvantage of the tree container type Python is the wart. Maybe there is a third-party library that implements one (for example, one that is associated with the "Unknown" that I have not tried, so I can not guarantee), but there is no standard Python type for it.

+1
source share

I stumbled upon this question in need of an OrderedMap and found my horror that the accepted answer is complete garbage. So I rolled back if anyone finds this useful:

 from bisect import * class OrderedMap: """Much less efficient than a dict, but keys don't need to be hashable.""" __default_arg = object() def __init__(self, keyvalues_iter = None): self.clear() if keyvalues_iter is not None: self.update(keyvalues_iter) def clear(self): self.__keys = [] self.__values = [] def __index(self, key): if self.__keys: index = bisect(self.__keys, key)-1 if self.__keys[index] == key: return index raise KeyError(key) def __len__(self): return len(self.__keys) def __contains__(self, key): try: self.__index(key) return True except KeyError: return False def __getitem__(self, key): index = self.__index(key) return self.__values[index] def __setitem__(self, key, value): try: index = self.__index(key) # exists self.__values[index] = value except KeyError: # new index = bisect(self.__keys, key) self.__keys.insert(index, key) self.__values.insert(index, value) def __delitem__(self, key): index = self.__index(key) self.__keys.pop(index) self.__values.pop(index) def __iter__(self): return iter(self.__keys) def get(self, key, default=__default_arg): try: return self[key] except KeyError: if default != OrderedMap.__default_arg: return default raise def setdefault(self, key, default = None): try: return self[key] except KeyError: if default != OrderedMap.__default_arg: self[key] = default return default raise def items(self): return zip(self.__keys, self.__values) def iteritems(self): return iter((self.__keys[x], self.__values[x]) for x in xrange(len(self))) def keys(self): return self.__keys[:] def iterkeys(self): return iter(self.__keys) def values(self): return self.__values[:] def itervalues(self): return iter(self.__values) def update(self, other): for k, v in other.iteritems(): self[k] = v def __repr__(self): s = ", ".join("%s: %s" % (repr(self.__keys[x]), repr(self.__values[x])) for x in xrange(len(self))) return "OrderedMap{%s}" % (s,) 
+1
source share

You may be looking for SortedCollection: http://code.activestate.com/recipes/577197-sortedcollection/

+1
source share

For a list that is sorted, you can try the heapq module.

0
source share

The Python module sortedcontainers provides a SortedDict for this purpose. It uses a modified data structure such as a B-tree and is written in pure Python. The module has 100% testing coverage and hours of stress. Although pure-Python, it implements C implementations faster and has a performance comparison for backups.

Because it is pure-Python, installation is a breeze with pip:

 pip install sortedcontainers 

Then you simply:

 from sortedcontainers import SortedDict help(SortedDict) 

Performance comparison includes a fairly complete list of alternative implementations. It is worth a look.

0
source share

All Articles