Unique data structure and quick add and delete

Question

Unique data structure and quick add and delete

I need a data structure with the following properties:

Each element of the structure must be unique.
Add: Adds one item to the data structure if the item already exists.
Pop: removes one item from the data structure and returns the item is deleted. It doesn’t matter which item is deleted.

No other operations are required for this structure. A naive implementation with a list will require almost O (1) time for Pop and O (N) time for adding (since the entire list must be checked to ensure uniqueness). I am currently using red-black tree to meet the needs of this data structure, but I am wondering if I can use something less complex to achieve almost the same performance.

I prefer answers in C #, but Java, Javascript and C ++ are also acceptable.

My question is similar to this question , however I do not need to search or delete the maximum or minimum value (or even any particular type of value), so I was hoping that there would be improvements in this regard. However, if any of the structures in this matter is appropriate, let me know.

So, what data structure allows you to use only unique elements, supports fast additions and deletions, and is less complicated than a red-black tree?

+4

c # algorithm data-structures

Peter O. Aug 31 '11 at 18:36

source share

3 answers

As Meta-Knight said, HashSet is the fastest data structure that does just that. Search and delete take constant O (1) time (except in rare cases when your hash sucks and then you need to reuse it repeatedly or you use a hash table). All operations with a hash set take O (1) time, the only drawback is that it requires more memory, because the hash is used as an index in an array (or other allocated memory block). Therefore, if you are NOT REALLY adhering strictly to memory, go to HashSet. I just explain the reason why you should go with this approach, and you should agree with the answer of the Meta-Knights, since it was the first.

The use of hashes is fine, because usually you override the HashCode () and Equals () functions. What the HashSet does internally generates a hash, then if it is equal to the equality check (only in the case of hash collisions). If this is not the case, he must call the method to do the so-called rehashing, which generates a new hash, which is usually in an odd primary offset from the original hash (not sure if .NET does this, but other languages) and repeat the process as necessary .

+5

Jesus ramos Aug 31 '11 at 18:42

source share

Removing a random item is pretty easy from a hash or dictionary. Everything is averaged by O (1), which in the real world means O (1). Example:

 public class MyNode { ... } public class MyDataStructure { private HashSet<MyNode> nodes = new HashSet<MyNode>(); /// <summary> /// Inserts an element to this data structure. /// If the element already exists, returns false. /// Complexity is averaged O(1). /// </summary> public bool Add(MyNode node) { return node != null && this.nodes.Add(node); } /// <summary> /// Removes a random element from the data structure. /// Returns the element if an element was found. /// Returns null if the data structure is empty. /// Complexity is averaged O(1). /// </summary> public MyNode Pop() { // This loop can execute 1 or 0 times. foreach (MyNode node in nodes) { this.nodes.Remove(node); return node; } return null; } }

Almost everything that can be compared can also be hashed :) in my experience. I would like to know if there is someone who knows something that cannot be hashed.

In my experience, this also applies to some tolerance floating point comparisons using special methods.

The hash function for the hash table does not have to be perfect, it just has to be good enough. Also, if your data is very complex, usually hash functions are less complex than red black trees or avl trees. They are useful because they preserve the order of things, but you do not need it.

To show how to make a simple hashset, I will consider a simple dictionary with integer keys. This implementation is very fast and very good for rare arrays for examples. I did not write code for the growth of the bucket table, because it is annoying and usually a source of big mistakes, but since this is proof of concept, this should be enough. I did not write an iterator either.

I wrote it in place, there may be errors.

 public class FixedIntDictionary<T> { // Our internal node structure. // We use structs instead of objects to not add pressure to the garbage collector. // We mantains our own way to manage garbage through the use of a free list. private struct Entry { // The key of the node internal int Key; // Next index in pEntries array. // This field is both used in the free list, if node was removed // or in the table, if node was inserted. // -1 means null. internal int Next; // The value of the node. internal T Value; } // The actual hash table. Contains indices to pEntries array. // The hash table can be seen as an array of singlt linked list. // We store indices to pEntries array instead of objects for performance // and to avoid pressure to the garbage collector. // An index -1 means null. private int[] pBuckets; // This array contains the memory for the nodes of the dictionary. private Entry[] pEntries; // This is the first node of a singly linked list of free nodes. // This data structure is called the FreeList and we use it to // reuse removed nodes instead of allocating new ones. private int pFirstFreeEntry; // Contains simply the number of items in this dictionary. private int pCount; // Contains the number of used entries (both in the dictionary or in the free list) in pEntries array. // This field is going only to grow with insertions. private int pEntriesCount; ///<summary> /// Creates a new FixedIntDictionary. /// tableBucketsCount should be a prime number /// greater than the number of items that this /// dictionary should store. /// The performance of this hash table will be very bad /// if you don't follow this rule! /// </summary> public FixedIntDictionary<T>(int tableBucketsCount) { // Our free list is initially empty. this.pFirstFreeEntry = -1; // Initializes the entries array with a minimal amount of items. this.pEntries = new Entry[8]; // Allocate buckets and initialize every linked list as empty. int[] buckets = new int[capacity]; for (int i = 0; i < buckets.Length; ++i) buckets[i] = -1; this.pBuckets = buckets; } ///<summary>Gets the number of items in this dictionary. Complexity is O(1).</summary> public int Count { get { return this.pCount; } } ///<summary> /// Adds a key value pair to the dictionary. /// Complexity is averaged O(1). /// Returns false if the key already exists. /// </summary> public bool Add(int key, T value) { // The hash table can be seen as an array of linked list. // We find the right linked list using hash codes. // Since the hash code of an integer is the integer itself, we have a perfect hash. // After we get the hash code we need to remove the sign of it. // To do that in a fast way we and it with 0x7FFFFFFF, that means, we remove the sign bit. // Then we have to do the modulus of the found hash code with the size of our buckets array. // For this reason the size of our bucket array should be a prime number, // this because the more big is the prime number, the less is the chance to find an // hash code that is divisible for that number. This reduces collisions. // This implementation will not grow the buckets table when needed, this is the major // problem with this implementation. // Growing requires a little more code that i don't want to write now // (we need a function that finds prime numbers, and it should be fast and we // need to rehash everything using the new buckets array). int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length; int bucket = this.pBuckets[bucketIndex]; // Now we iterate in the linked list of nodes. // Since this is an hash table we hope these lists are very small. // If the number of buckets is good and the hash function is good this will translate usually // in a O(1) operation. Entry[] entries = this.pEntries; for (int current = entries[bucket]; current != -1; current = entries[current].Next) { if (entries[current].Key == key) { // Entry already exists. return false; } } // Ok, key not found, we can add the new key and value pair. int entry = this.pFirstFreeEntry; if (entry != -1) { // We found a deleted node in the free list. // We can use that node without "allocating" another one. this.pFirstFreeEntry = entries[entry].Next; } else { // Mhhh ok, the free list is empty, we need to allocate a new node. // First we try to use an unused node from the array. entry = this.pEntriesCount++; if (entry >= this.pEntries) { // Mhhh ok, the entries array is full, we need to make it bigger. // Here should go also the code for growing the bucket table, but i'm not writing it here. Array.Resize(ref this.pEntries, this.pEntriesCount * 2); entries = this.pEntries; } } // Ok now we can add our item. // We just overwrite key and value in the struct stored in entries array. entries[entry].Key = key; entries[entry].Value = value; // Now we add the entry in the right linked list of the table. entries[entry].Next = this.pBuckets[bucketIndex]; this.pBuckets[bucketIndex] = entry; // Increments total number of items. ++this.pCount; return true; } /// <summary> /// Gets a value that indicates wether the specified key exists or not in this table. /// Complexity is averaged O(1). /// </summary> public bool Contains(int key) { // This translate in a simple linear search in the linked list for the right bucket. // The operation, if array size is well balanced and hash function is good, will be almost O(1). int bucket = this.pBuckets[(key & 0x7FFFFFFF) % this.pBuckets.Length]; Entry[] entries = this.pEntries; for (int current = entries[bucket]; current != -1; current = entries[current].Next) { if (entries[current].Key == key) { return true; } } return false; } /// <summary> /// Removes the specified item from the dictionary. /// Returns true if item was found and removed, false if item doesn't exists. /// Complexity is averaged O(1). /// </summary> public bool Remove(int key) { // Removal translate in a simple contains and removal from a singly linked list. // Quite simple. int bucketIndex = (key & 0x7FFFFFFF) % this.pBuckets.Length; int bucket = this.pBuckets[bucketIndex]; Entry[] entries = this.pEntries; int next; int prev = -1; int current = entries[bucket]; while (current != -1) { next = entries[current].Next; if (entries[current].Key == key) { // Found! Remove from linked list. if (prev != -1) entries[prev].Next = next; else this.pBuckets[bucketIndex] = next; // We now add the removed node to the free list, // so we can use it later if we add new elements. entries[current].Next = this.pFirstFreeEntry; this.pFirstFreeEntry = current; // Decrements total number of items. --this.pCount; return true; } prev = current; current = next; } return false; } }

If you wander if this implementation is good or not, this is a very similar implementation of what the .NET platform does for the Dictionary class :)

To make it hashset, just delete T and you have hashset of integers. If you need to get hash codes for shared objects, just use x.GetHashCode or provide your hash function.

To write iterators, you need to change a few things, but don’t want to add too many other things to this post :)

+3

Salvatore previti Aug 31 '11 at 18:50

source share

Meta-knight · Accepted Answer · 2011-08-31T18:39:31+0000

What about the built-in HashSet<T> ?

It contains only unique elements. Remove (pop) is O (1), and Add is O (1) if the internal size of the array should not be changed.

Unique data structure and quick add and delete

More articles: