Soft heaps: what is corruption and why is it useful?

Question

Soft heaps: what is corruption and why is it useful?

Recently I read an article by Bernard Chazelle, “Soft heap, approximate priority sequence with optimal error rate of Bernard Chazelle” ( http://www.link.cs.cmu.edu/15859-f07/papers/chazelle-soft-heap.pdf )

The document says a lot about "corruption." What is corruption, how are elements damaged and how does this help you?

I spent a lot of time reading in the newspaper and on Google, and it still doesn't make sense.

+7

heap data-structures corruption soft-heap

kalibra 30 sept '14 at 16:55

source share

2 answers

templatetypedef · Answer 1 · 2014-09-30T17:36:33+0000

In most research work on priority queues, each element in the queue has an associated number, called a priority, which is set when the element is inserted. Then the elements are unloaded in ascending order of priority. Most programming languages these days that support priority queues do not actually use explicit priorities and instead rely on a comparison function to rank items, but the soft heap uses a “numerical priority” model.

Since priority queues deactivate elements in order of increasing priority, they can be used to sort a sequence of values - starting by inserting each element into a priority queue with a priority equal to its rank in the sequence, and then cutting off all elements from the priority queue. This pulls out the items in sorted order.

This connection between priority queues and sorting is at the expense of cost. The lower bounds for comparison sorting algorithms are known (sorting sorting algorithm cannot have a runtime that is better than O (n log n)). Therefore, there is a lower bound on the execution time of any priority queue based on comparison. In particular, n instances and n objects must have a total value no better than O (n log n). In most cases, this is fine, but in some cases it is not fast enough.

While the priority queue can be used to sort the input sequence, the execution time of n instances and n outputs will never beat O (n log n). But what if the priority queue does not sort the input? Take this as a last resort - if the priority queue returns the items back in an absolutely arbitrary order, then it is possible to implement n enqueues and n dequeues in O (n) time - for example, use a stack or a queue.

Intuitively, you can think of a soft heap as a bridge between the two extremes of "always sorted" and "no guarantees regarding order." Each sort heap is parameterized over a certain value; called the "corruption parameter", which determines how close the values that come out of the soft heap can be sorted. In particular, as & epsilon; approaching 0, the result will be sorted gradually and like & epsilon; approaching 1, the result will become more and more arbitrary. Accordingly, the execution time of operations with a soft heap is defined as a function of O (log? Epsilon; ^-1 ), so the execution time of operations becomes cheaper and cheaper than? Epsilon; (and therefore the result becomes less sorted) and operations become more expensive like & epsilon; (in this case, the result is sorted more and more).

The soft heap accurately determines how unsorted output will use the new concept of corruption. In a normal priority queue, when you insert an element / priority pair, the element's priority never changes. In a soft heap, items associated with priority can become damaged when the item is inside the soft heap. When an element’s priority is damaged, its priority increases by a certain amount. (Since the soft heap removes the items in order of increasing priority, the priority of increasing the item means that it will exit the queue later than usual). In other words, corruption will result in the elements not being displayed in an orderly manner, since the priorities of the elements when they are unloaded do not necessarily coincide with the fact that they are in the queue.

The choice of & epsilon; adjusts the number of different items that may be damaged. C & epsilon; small, fewer elements have corrupt priorities, and with & epsilon; larger, more elements will have corrupt priorities.

Now, to your specific questions - how will the priorities of the elements get corrupted and how will this help you? Your first question is good - how does a data structure decide when to corrupt priorities? There are two ways to view this. Firstly, you can think of a soft heap as a data structure, where you specify in advance how acceptable corruption is (the epsilon parameter), and the data structure then internally decides when and how to corrupt priorities that exceed the overall level of corruption. If it seems strange that the data structure makes such decisions, think of something like a Bloom filter or a skiplist, where random samples actually occur inside that can affect the observed behavior of the data structure. It turns out that a soft heap is usually not implemented using randomness (an impressive feature!), But this is not particularly relevant here.

Inside, two well-known implementations of soft heaps (one of the original Chazelle paper and subsequent cleanup using binary trees) implement corruption using a method called carpooling, where elements are grouped together and all have a common priority. Corruption occurs because the initial priorities of all elements in each group are forgotten, and a new priority is used instead. The factual information about how the elements are grouped is intimidatingly difficult and you should not really look into it, so it’s best to leave it as “the data structure chooses to corrupt, but it wants to if it does not damage more elements than you specified when choosing & epsilon ; "

Next, why is this useful? In practice, this is not so. A soft bunch of almost exclusively theoretical interest. The reason is that the execution time of n inserts and n deletions from the soft heap can be O (n) - faster than O (n log n) - if & epsilon; is selected correctly. Initially, soft heaps were used as a building block in a fast algorithm for constructing minimal spanning trees. They are also used in the new algorithm for linear time selection, the first such deterministic algorithm working in linear time since the known median median algorithm. In both cases, the soft heap is used to “approximately” sort the input elements in a way that allows the algorithms to get an approximate approximation of the ordered sequence, after which the algorithm performs some additional logic to correct the lack of perfection. You will almost certainly never see the soft heap used in practice, but if you eventually find a case when you do this, leave a comment and let me know!

Summarizing:

Priority corruption is a way to compromise between perfect sorting (accurate, but slow) and arbitrary ordering (inaccurate, but very fast). The & epsilon; parameter determines where the amount of corruption lies on the spectrum.
Corruption works by changing the priorities of existing elements in a soft heap, in particular by raising the priorities of some elements. Low damage corresponds to approximately sorted sequences, while high damage corresponds to more arbitrary sequences.
How corruption occurs is a specific data structure and is very difficult to understand. It's best to think of soft heaps as corruption when they are needed, but never exceed the limit imposed by the choice of & epsilon;
Corruption is useful in theoretical settings, where sorting is too slow, but an approximately correctly sorted sequence is good enough for practical purposes. This is unlikely to be useful in practice.

Hope this helps!

user133536 · Answer 2 · 2014-09-30T17:04:37+0000

The answer on the second page:

"A soft heap can increase the value of certain keys at any time. Such expansion keys and corresponding elements are called damaged. Corruption is entirely at the discretion of the data structure, and the user does not control it. Naturally, findmin returns the minimum current key, which may or may not to be damaged. The advantage is speed: during heap updates, items move together packages in the form of a “car pool” in order to save time. From an information-theoretic point of view, corruption is o a way to reduce the entropy of the data stored in the data structure, and thereby facilitate its treatment. Entropy is defined as the logarithm of the second, the number of different key assignments (that is, the entropy of a uniform distribution over the job key) to make sure this idea is valid push it to your limit and observe that if each key was damaged by increasing its value to `, then the set of keys will have zero entropy, and we could trivially perform all operations in constant time. Interestingly, soft heaps show that entropy should not go down to zero, complexity becomes constant. "

Is this a presumptuous data structure?

Soft heaps: what is corruption and why is it useful?

More articles: