A dictionary implementation where equivalent content is equal and returns the same hash code regardless of insertion order

I need to use the Dictionary<long, string> collections, which specify two instances of d1 and d2 , where each of them has the same KeyValuePair<long, string> content, which can be inserted in any order:

  • (d1 == d2) evaluates to true
  • d1.GetHashCode() == d2.GetHashCode()

The first requirement was achieved most easily using SortedDictionary instead of a regular Dictionary .

The second requirement is necessary because I have one point where I need to save Dictionary<Dictionary<long, string>, List<string> - the main type of Dictionary used as a key for another Dictionary , and if HashCodes are not evaluated based on the identical contents of ContainsKey() will not work the way I want (i.e. if an element with d1 already inserted in the dictionary, then dictionary.ContainsKey(d2) should be evaluated as true .

To do this, I created a new class ComparableDictionary : SortedDictionary<long, string> object class ComparableDictionary : SortedDictionary<long, string> and included the following:

 public override int GetHashCode() { StringBuilder str = new StringBuilder(); foreach (var item in this) { str.Append(item.Key); str.Append("_"); str.Append(item.Value); str.Append("%%"); } return str.ToString().GetHashCode(); } 

In my unit testing, this meets the criteria for both equality and hash codes. However, while reading the "Guide and Rules for GetHashCode", I came across the following:

Rule: the integer returned by GetHashCode should never change while the object is contained in the data structure, which depends on the remaining hash code

It is acceptable, albeit dangerous, to make an object whose hash code value can mutate, because the fields of the object mutate. If you have such an object and you put it in a hash table, then the code that mutates the object and the code that supports the hash table should have some consistent protocol that ensures that the object is not mutated while it is in the hash -table. What this protocol looks like is up to you.

If the hash code of an object can mutate when it is in the hash table, then obviously the Contains method stops working. You put an object in bucket # 5, you mutate it, and when you ask if it contains a mutated object, it looks in bucket # 74 and does not find it.

Remember that objects can be placed in hash tables in ways you did not expect. Many LINQ sequence operators use internal hash tables. Do not send dangerous mutant objects by listing a LINQ query that returns them!

Now the Dictionary<ComparableDictionary, List<String>> used only once in the code, in the place where the contents of all the ComparableDictionary collections should be set. Thus, according to these recommendations, I believe that it would be acceptable to override GetHashCode as I did (based entirely on the contents of the dictionary).

After this introduction, my questions are :

  • I know that the performance of SortedDictionary very low compared to Dictionary (and I can have hundreds of object instances). The only reason to use SortedDictionary is that I can have equality comparison work based on the contents of the dictionary, regardless of the insertion order. Is there a better way to achieve this equality requirement without using a SortedDictionary ?
  • Is my GetHashCode implementation acceptable based on requirements? Despite the fact that it is based on mutable content, I don’t think it should pose any risk since the only place it is used (I think) is after the content has been installed.

Note. While I was customizing them using Dictionary or SortedDictionary , I am not tied to these types of collections. The basic need is a collection that can store pairs of values ​​and meet the equality and hash requirements defined above.

+4
source share
1 answer

Your implementation of GetHashCode seems acceptable to me, but it is not the way I would do it.

This is what I would do:

  • Use composition, not inheritance. Among other things, inheritance becomes odd in terms of equality
  • Use the Dictionary<TKey, TValue> inside the dictionary
  • Deploy GetHashCode by taking the XOR of the individual hash codes of the key / value pair
  • Implement equality by checking if the sizes are the same, and then check each key in "this" to see if its value matches in another dictionary.

So something like this:

 public sealed class EquatableDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IEquatable<ComparableDictionary<TKey, TValue>> { private readonly Dictionary<TKey, TValue> dictionary; public override bool Equals(object other) { return Equals(other as ComparableDictionary<TKey, TValue>); } public bool Equals(ComparableDictionary<TKey, TValue> other) { if (ReferenceEquals(other, null)) { return false; } if (Count != other.Count) { return false; } foreach (var pair in this) { var otherValue; if (!other.TryGetValue(pair.Key, out otherValue)) { return false; } if (!EqualityComparer<TValue>.Default.Equals(pair.Value, otherValue)) { return false; } } return true; } public override int GetHashCode() { int hash = 0; foreach (var pair in this) { int miniHash = 17; miniHash = miniHash * 31 + EqualityComparer<TKey>.Default.GetHashCode(pair.Key); miniHash = miniHash * 31 + EqualityComparer<Value>.Default.GetHashCode(pair.Value); hash ^= miniHash; } return hash; } // Implementation of IDictionary<,> which just delegates to the dictionary } 

Also note that I can't remember if EqualityComparer<T>.Default.GetHashCode null values ​​- I have a suspicion of what it is doing, returning 0 for null. Worth checking though :)

+4
source

All Articles