Python related list O (1) insert / remove

I am looking for an implementation of a linked list and related algorithms for Python. Everyone I ask just recommend using Python's built-in lists, but performance measurements show that inserting and deleting a list is the bottleneck for our application. This is trivial for implementing a simple linked list, but I wonder if there is a mature library that includes some operations such as sorting, merging, merging, searching, lower / upper bounds, etc.

I know this is a hoax, but searching for a python list on any search engine gives predictable poor results, as most people just say that linked links are not needed in python (pfft!).

PS: I need to insert and delete from anywhere on the list, not just at the ends.

OK, you asked for this: I need to keep an ordered list of several hundred thousand entries. I will iterate the list forward (one by one) using the visitor on each record, starting from the beginning or position found by binary search. When a record matching the predicate is found, it is removed from the list, and then another binary search is performed in a subset of the list, starting from the previous position of the deleted record, until the position is determined statistically in advance. Ignoring the error condition, the modified entry can be used to create another linked list that is spliced ​​to the new position found in the second binary search. The iteration continues from the position in which the entry was deleted. Sometimes, several thousand adjacent ordered records can be added or deleted from anywhere in the list. Sometimes it is necessary to search and delete several thousand non-contiguous records.

the python list is unacceptable since the insert / delete cost is prohibitive, and the small speed gains for binary search are completely out of line with the total cost. Our tests at home confirm this.

If I neglected any detail, perhaps I can send you a copy of my company’s non-disclosure agreement by e-mail, and I can privately correspond with you on this matter. sarcasm.end ().

+6
python linked-list list algorithm
source share
10 answers

Here 's a blog post sharing your pain. It includes a linked list implementation and performance comparison.


Perhaps a blist would be better though (from here )?

Use cases in which a BList is slightly slower than a Python list. as follows (O (log n) vs O (1)):

  • A large list that never changes length.
  • Large lists in which insertions and deletions are only at the end (LIFO).

Given this disclaimer, here are some of the use cases when BLists are significantly faster than the built-in list:

  • Insert or remove from a large list (O (log n) vs O (n))
  • Taking large snippets of large lists (O (log n) vs O (n))
  • Making small copies of large lists (O (1) vs O (n))
  • Changing large fragments of large lists (O (log n + log k) versus O (n + k))
  • Multiplying a list by a large, sparse list (O (log k) vs. O (kN))

Note that it is actually implemented as a B + tree, which provides high performance for all these operations.

+10
source share

Python O (1) lists for operations at the end of a list . If you do all your inserts in a semi-sequential manner - similar to C, keeping only one pointer in the middle of the list as a "cursor", you can save a lot of effort just by using two Python lists. One list for which before the cursor, one for what after; moving the cursor involves pulling the next item from one list and adding it to another. This gives you an arbitrary setting of O (1) at the cursor location with much less effort and rethinking the wheels than creating an entire new data structure, allowing you to reuse many existing list functions.

For a completely general case that allows multiple links in a list, you are probably stuck creating a linked list.

Edit: You don't seriously think that you are doing a “binary search” in a linked list, are you? Binary search does not even make sense in the structure of sequentially sequential data ...

In any case, if you're fine with linear time search, and your inserts will always keep the list order without re-sorting, then you might need a simple linked list. If you do as much searching as iteration, you should consider something with fast indexing, and if you resort to using it, something like a tree might be better.

+7
source share

There is a singly linked list here (recipe 17.14 in the Python Cookbook 1st ed), but it is hardly "mature" or rich - it just executes the FIFO queue, so it is pretty minimal.

This recipe is a very concise implementation of C (read-only) Lisp-like cons-boxes - car only, cdr and cons; again, not a rich type, but a minimal one (and use it for mutable data, unlike pure functional approaches, you need to add at least setcar and setcdr). This may be the best starting point for you simply because cons-cells are so famous and flexible and familiar.

Some of the operations you need are most likely to be performed by existing Python primitives. For example, for sorting, it’s hard to understand how rolling your own view can outperform Python's sorted(linkedlist) (since, of course, how do you make the linkedlist type iterable Python so that it reproduces well with the rest of the language and library ;-), given the power timsort algorithm implemented in the Python runtime.

More generally, I suggest you carefully timeit things at every step along the way to consider how much the C-coded approach really buys you (compared to the trivial C-coded example, an example of which is a recipe in the Cookbook, the URL I give at the beginning of this answer) - it will depend to a large extent on the size and nature of your application lists, so you can best organize these tests.

+6
source share

Surprisingly, all require justification for the need for a linked list. Linked lists are one of the most basic data structures for a reason: they have properties that are not found in other basic data structures, and if you need these properties, you need a linked list or one of its close relatives. If you don’t understand why linked lists are an important data structure that cannot always be replaced with a deque or binary tree, you should never pass your intro to data structure class.

Here is a quick implementation that supports the usual things: constant insertion at any point with a link to node, splitting the list into two lists and inserting the list into the middle of another list (splicing). Common Python interfaces are supported: push, pop, pushleft, popleft, extend, regular iteration, fragment iteration (getiter).

I just wrote this, so it is disputed, but not tested; there are probably still errors.

 def _ref(obj): """ weakref.ref has a bit of braindamage: you can't take a weakref to None. This is a major hassle and a needless limitation; work around it. """ from weakref import ref if obj is None: class NullRef(object): def __call__(self): return None return NullRef() else: return ref(obj) class _node(object): def __init__(self, obj): self.obj = obj self._next = None self._prev = _ref(None) def __repr__(self): return "node(%s)" % repr(self.obj) def __call__(self): return self.obj @property def next(self): return self._next @property def prev(self): return self._prev() # Implementation note: all "_last" and "prev" links are weakrefs, to prevent circular references. # This is important; if we don't do this, every list will be a big circular reference. This would # affect collection of the actual objects in the list, not just our node objects. # # This means that _node objects can't exist on their own; they must be part of a list, or nodes # in the list will be collected. We also have to pay attention to references when we move nodes # from one list to another. class llist(object): """ Implements a doubly-linked list. """ def __init__(self, init=None): self._first = None self._last = _ref(None) if init is not None: self.extend(init) def insert(self, item, node=None): """ Insert item before node. If node is None, insert at the end of the list. Return the node created for item. >>> l = llist() >>> a = l.insert(1) >>> b = l.insert(2) >>> d = l.insert(4) >>> l._check() [1, 2, 4] >>> c = l.insert(3, d) >>> l._check() [1, 2, 3, 4] """ item = _node(item) if node is None: if self._last() is not None: self._last()._next = item item._prev = _ref(self._last()) self._last = _ref(item) if self._first is None: self._first = item else: assert self._first is not None, "insertion node must be None when the list is empty" if node._prev() is not None: node._prev()._next = item item._prev = node._prev item._next = node node._prev = _ref(item) if node is self._first: self._first = item return item def remove(self, node): """ >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> d = l.append(4) >>> e = l.append(5) >>> l.remove(c) # Check removing from the middle 3 >>> l._check() [1, 2, 4, 5] >>> l.remove(a) # Check removing from the start 1 >>> l._check() [2, 4, 5] >>> l.remove(e) # Check removing from the end 5 >>> l._check() [2, 4] """ if self._first is node: self._first = node._next if self._last() is node: self._last = node._prev if node._next is not None: node._next._prev = node._prev if node._prev() is not None: node._prev()._next = node._next node._next = None node._prev = _ref(None) return node.obj def __nonzero__(self): """ A list is true if it has any elements. >>> l = llist() >>> bool(l) False >>> l = llist([1]) >>> bool(l) True """ return self._first is not None def __iter__(self): """ >>> l = llist([1,2,3]) >>> [i() for i in l] [1, 2, 3] """ return self.getiter(self._first, self._last()) def _check(self): if self._last() is None: assert self._last() is None return [] node = self._first ret = [] while node is not None: if node._next is None: assert node == self._last() if node._prev() is None: assert node == self._first if node._next is not None: assert node._next._prev() == node if node._prev() is not None: assert node._prev()._next == node ret.append(node.obj) node = node._next return ret def getiter(self, first, last): """ Return an iterator over [first,last]. >>> l = llist() >>> l.append(1) node(1) >>> start = l.append(2) >>> l.extend([3,4,5,6]) >>> end = l.append(7) >>> l.extend([8,9]) >>> [i() for i in l.getiter(start, end)] [2, 3, 4, 5, 6, 7] """ class listiter(object): def __init__(self, first, last): self.node = first self.final_node = last def __iter__(self): return self def next(self): ret = self.node if ret is None: raise StopIteration if ret is self.final_node: self.node = None else: self.node = self.node._next return ret return listiter(first, last) def append(self, item): """ Add an item to the end of the list. >>> l = llist() >>> l.append(1) node(1) >>> l.append(2) node(2) >>> l._check() [1, 2] """ return self.insert(item, None) def appendleft(self, item): """ Add an item to the beginning of the list. >>> l = llist() >>> l.appendleft(1) node(1) >>> l.appendleft(2) node(2) >>> l._check() [2, 1] """ return self.insert(item, self._first) def pop(self): """ Remove an item from the end of the list and return it. >>> l = llist([1,2,3]) >>> l.pop() 3 >>> l.pop() 2 >>> l.pop() 1 >>> l.pop() Traceback (most recent call last): ... IndexError: pop from empty llist """ if self._last() is None: raise IndexError, "pop from empty llist" return self.remove(self._last()) def popleft(self): """ Remove an item from the beginning of the list and return it. >>> l = llist([1,2,3]) >>> l.popleft() 1 >>> l.popleft() 2 >>> l.popleft() 3 >>> l.popleft() Traceback (most recent call last): ... IndexError: popleft from empty llist """ if self._first is None: raise IndexError, "popleft from empty llist" return self.remove(self._first) def splice(self, source, node=None): """ Splice the contents of source into this list before node; if node is None, insert at the end. Empty source_list. Return the first and last nodes that were moved. # Test inserting at the beginning. >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> l2 = llist([4,5,6]) >>> l.splice(l2, a) (node(4), node(6)) >>> l._check() [4, 5, 6, 1, 2, 3] >>> l2._check() [] # Test inserting in the middle. >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> l2 = llist([4,5,6]) >>> l.splice(l2, b) (node(4), node(6)) >>> l._check() [1, 4, 5, 6, 2, 3] >>> l2._check() [] # Test inserting at the end. >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> l2 = llist([4,5,6]) >>> l.splice(l2, None) (node(4), node(6)) >>> l._check() [1, 2, 3, 4, 5, 6] >>> l2._check() [] # Test inserting a list with a single item. >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> l2 = llist([4]) >>> l.splice(l2, b) (node(4), node(4)) >>> l._check() [1, 4, 2, 3] >>> l2._check() [] """ if source._first is None: return first = source._first last = source._last() if node is None: if self._last() is not None: self._last()._next = source._first source._first._prev = self._last self._last = source._last if self._first is None: self._first = source._first else: source._first._prev = node._prev source._last()._next = node if node._prev() is not None: node._prev()._next = source._first node._prev = source._last if node is self._first: self._first = source._first source._first = None source._last = _ref(None) return first, last def split(self, start, end=None): """ Remove all items between [node, end] and return them in a new list. If end is None, remove until the end of the list. >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> d = l.append(4) >>> e = l.append(5) >>> l._check() [1, 2, 3, 4, 5] >>> l2 = l.split(c, e) >>> l._check() [1, 2] >>> l2._check() [3, 4, 5] >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> d = l.append(4) >>> e = l.append(5) >>> l2 = l.split(a, c) >>> l._check() [4, 5] >>> l2._check() [1, 2, 3] >>> l = llist() >>> a = l.append(1) >>> b = l.append(2) >>> c = l.append(3) >>> d = l.append(4) >>> e = l.append(5) >>> l2 = l.split(b, d) >>> l._check() [1, 5] >>> l2._check() [2, 3, 4] """ if end is None: end = self._last() ret = llist() # First, move the region into the new list. It important to do this first, or # once we remove the nodes from the old list, they'll be held only by weakrefs and # nodes could end up being collected before we put it into the new one. ret._first = start ret._last = _ref(end) # Hook our own nodes back together. if start is self._first: self._first = end._next if end is self._last(): self._last = start._prev if start._prev() is not None: start._prev()._next = end._next if end._next is not None: end._next._prev = start._prev start._prev = _ref(None) end._next = None return ret def extend(self, items): """ >>> l = llist() >>> l.extend([1,2,3,4,5]) >>> l._check() [1, 2, 3, 4, 5] """ for item in items: self.append(item) if __name__ == "__main__": import doctest doctest.testmod() 
+5
source share

The Python deque class is 0 (1) for insertion and deletion at the beginning and end of the list.

+3
source share

“I will go forward the list (one by one) using the visitor on each record, starting at the beginning or at the position found by the binary search. When a record matching the predicate is found, it is removed from the list and then another binary search is performed in a subset of the list starting from the previous position of the deleted record "

It seems that the linked list is an absolutely incorrect data structure for this - to perform a binary search, random access to the list will be required, which will mean repeated repetition of elements. This will most likely be slower in a linked list than inserting and removing items in a python list.

It looks like the data structure you want is a skip list . Google implements several implementations, but I can not comment on their completeness or quality.

change

Another data structure that may be suitable is a threaded binary tree . this looks like a regular binary tree, but each leaf node points to the next / previous subtree, so it can be repeated as efficiently as a linked list. Implementing this in Python is left as an exercise for the reader (or Google).

+3
source share

For big data, saving a sorted list is a trick. Do not insert, but add new elements at the end, and then sort. Do not delete the element, but replace it with a special value, sort it to the end, and then exit. For searches, a sorted list also has very fast performance using the bisection method. For small data, iterating over an old list, filtering and creating a new one, as a method of counting lists, is always a quick way.

For me, what is big data? it should be more than 1,000,000 items ...

+1
source share

Here is an idea that will require a bit of coding, but can give you extremely better performance. This may or may not be suitable for your use case.

You can combine the new list into your list by replacing one item. To insert the list [6, 7, 8] into [1, 2, 3, 4, 5] with index 2, you will get

[1, 2, [3, 6, 7, 8], 4, 5]

Without changing the length of the large (here are 5 elements) list, you will not have speed problems that you have.

You can “remove” an item from the list in the same way by replacing it with an empty list.

[1, 2, [], 4, 5]

To iterate over this mixed list is simple.

 def IterateNestedList(xs): for x in xs: if isinstance(x, list): for y in IterateNestedList(x): yield y else: yield x 
0
source share

I recently needed a round and doubly linked list. Since I am very familiar with the Linux line, the linked list. I wrote a list of subdirectories in Python. It provides O (1) random insertion and deletion. This is much faster than the Python list when you do random insertion and deletion in a large list. Code here: https://github.com/ningke/Pylnklist . I also wrote a little introduction here: http://710003.blogspot.com/2012/06/copycat-linked-list-in-python.html

0
source share

How about using some kind of data structure that provides sorted access to data? For example, binary (AVL trees, AVL, Red-black)? They guarantee O (log (N)) input complexity. Not O (1), but better than yours.

0
source share

All Articles