Efficient structure of data storage objects sorted by several keys

I have a python program where I use a priority queue to keep track of the objects that will be processed. The queue is currently implemented using a SortedList, which works very well.

However, I need to extend this code so that the list is sorted by several keys. Just like an SQL database with indexes in multiple columns so that I can efficiently retrieve, add, and delete objects on all keys. My workload is add / remove heavy . To give an idea of ​​what I want to do, here are a few pseudo codes:

ml = MultiSortedList() ml.append((1, "z", 1.5), a) ml.append((2, "a", 40.0), b) ml.append((3, "f", 0.5), c) print(ml.sorted(0)) [((1, "z", 1.5), a), ((2, "a", 40.0), b), ((3, "f", 0.5), c),] print(ml.sorted(2)) [((3, "f", 0.5), c), ((1, "z", 1.5), a), ((2, "a", 40.0), b)] print(ml.sorted(2).pop(1) (1, "z", 1.5), a) print(ml.sorted(0)) [((2, "a", 40.0), b), ((3, "f", 0.5), c)] 

I can’t figure out how to do this effectively. Of course, I could sort the list for each access to another column, but it is too expensive. In addition, O(n) deletion operations in python lists become painful since the list can contain several thousand objects.

Is there an existing data structure (preferably already implemented in python) that solves this problem? If not, can you help me with an outline of how to effectively implement this?

+7
python heap sorting data-structures
source share
2 answers

Instead of using a sorted list as an implementation of your priority queue, you should use heap . For example, a binary heap has log(n) insertion and deletion. The Fibonacci heap will have an O(1) insert.

You have an implementation inside the standard library: heapq .

For your use case, you need to save several heaps: one for each sort order. For clarity of implementation, you may want to save your original data in a dictionary (possibly using random or increasing keys) and save your keys only in heaps.

Using this method, you can easily get O(1) insertion and O(log(n)) deletion. You do not specify what types of access you will have, but if you need random access, you can use the binary search in the corresponding heap to get O(log(n))

+5
source share

One approach is to maintain an n list inside, one for each sort order, and add / remove an item for each of them. Thus, you “only” multiply the time of manipulation of your data structure by a constant value (ie Using 3 keys instead of one, and then 3 log(n) instead of log(n) to remove / insert an element).

The concept that I present behind such an implementation is a Java comparator.

Create a comparator method for each key that will be used to sort them, and then use it when inserting and deleting time.

This will work as follows:

 class SortedList(list): def __init__(self, comparator = None): list.__init__(self) self.comparator = comparator def add(self, element): """ Adds the element into the sorted list. If no comparator where provided at initialisation, then try to compare the element to item already stored in the list using standard __gt__ and __eq__. Argument : element : the element to insert in the list /!\ NB : A more inteligent implementation should be used, such as binary search... /!\ """ index = 0 while (index < len(self)): if self.isGreatterThan(element, self[index]): index += 1 else: break self.insert(index, element) def remove(self, element): """ Same as previous, a better approach is possible that use binary search """ list.remove(self, element) def isGreatterThan(self, element, otherElement): """ Compare if element is greater than other element. """ if self.comparator != None: return self.comparator(element, otherElement) else: return element.__gt__(otherElement) class MultipleKeysObjectContainer(): def __init__(self, comparators): #register the comparators self.comparators = comparators #create a List for each comparator self.data = {} for key in self.comparators.keys(): self.data[key] = SortedList(comparator = self.comparators[key]) def add(self, element): for key in self.data.keys(): self.data[key].add(element) def __repr__(self): return "<MultipleKeysObjectContainer :"+self.data.__repr__()+">" def __str__(self): result = "MultipleKeysObjectContainer{\n" for key in self.data.keys(): result += "\tOrder by : "+key+"{\n" for element in self.data[key]: result += "\t\t" + str(element) + "\n" result += "\t}\n" result += "}" return result def popOrderedBy(self, key, position): """ pop the item a the position in the list of item ordered by the key. Remove also from other data containers. """ item = self.data[key].pop(position) for dataKey in self.data.keys(): if dataKey != key: self.data[dataKey].remove(item) return item if __name__ == "__main__": a = SortedList(lambda x,y : x[0][0] > y[0][0]) item1 = ((1, "z", 1.5),"foo") item2 = ((2, "a", 40.0), "bar") item3 = ((3, "f", 0.5), "barfoo") a.add(item1) a.add(item3) a.add(item2) print("Example of sorted list") print(a) a.remove(item3) print("The same list without the barfoo") print(a) b = MultipleKeysObjectContainer({"id": (lambda x,y : x[0][0] > y[0][0]), "letter": (lambda x,y : x[0][1] > y[0][1] ), "value":(lambda x,y : x[0][2] > y[0][2])}) b.add(item1) b.add(item3) b.add(item2) print("A multiple key container, object are ordered according three criterion.") print(b) print("Remove the first item if items are ordered by letter", b.popOrderedBy("letter", 0)) print("After this removing the container contains :") print(b) 

This leads to:

 Example of sorted list [((1, 'z', 1.5), 'foo'), ((2, 'a', 40.0), 'bar'), ((3, 'f', 0.5), 'barfoo')] The same list without the barfoo [((1, 'z', 1.5), 'foo'), ((2, 'a', 40.0), 'bar')] A multiple key container, object are ordered according three criterion. MultipleKeysObjectContainer{ Order by : id{ ((1, 'z', 1.5), 'foo') ((2, 'a', 40.0), 'bar') ((3, 'f', 0.5), 'barfoo') } Order by : value{ ((3, 'f', 0.5), 'barfoo') ((1, 'z', 1.5), 'foo') ((2, 'a', 40.0), 'bar') } Order by : letter{ ((2, 'a', 40.0), 'bar') ((3, 'f', 0.5), 'barfoo') ((1, 'z', 1.5), 'foo') } } Remove the first item if items are ordered by letter ((2, 'a', 40.0), 'bar') After this removing the container contains : MultipleKeysObjectContainer{ Order by : id{ ((1, 'z', 1.5), 'foo') ((3, 'f', 0.5), 'barfoo') } Order by : value{ ((3, 'f', 0.5), 'barfoo') ((1, 'z', 1.5), 'foo') } Order by : letter{ ((3, 'f', 0.5), 'barfoo') ((1, 'z', 1.5), 'foo') } } 

It looks like you're looking (almost just add a binary search: p) Good luck!

+1
source share

All Articles