Create a hash value in a list?

I have a List<MyRichObject> with 50 instances. Each of the instances has 1 or 2 unique properties, but in this way they are all unique, since there is only one element in the list, etc.

I would like to come up with a unique way to "hash" this list so that it is unique from all other lists. Is there a way to do this in .NET 4?

The goal is to create a kind of "monniker" for the lists so that they can be dropped into the queue and found later based on their unique value.

Thanks.

+7
source share
2 answers

TL; DR

 public static int GetSequenceHashCode<T>(this IList<T> sequence) { const int seed = 487; const int modifier = 31; unchecked { return sequence.Aggregate(seed, (current, item) => (current*modifier) + item.GetHashCode()); } } 

Why worry about another answer?

the accepted answer can give dangerously inaccurate results if you have several items in the list with the same hash code. For example, consider these entries:

 var a = new []{ "foo" }; var b = new []{ "foo", "bar" }; var c = new []{ "foo", "bar", "spam" }; var d = new []{ "seenoevil", "hearnoevil", "speaknoevil" }; 

All of them give different results, assuming that they are all unique collections. Big! Now try with a duplicate:

 var e = new []{ "foo", "bar", "spam" }; 

GetSequenceHashCode should give the same result for c and e - and it is. So far, so good. Now try with the details from the sequence:

 var f = new []{ "spam", "bar", "foo" }; 

Uh oh ... GetSequenceHashCode indicates that f is equal to both c and e , which is not. Why is this happening? First, divide it by the actual hash code values, using c as an example:

 int hashC = "foo".GetHashCode() ^ "bar".GetHashCode() ^ "spam".GetHashCode(); 

Since the exact numbers are not very important here, and for a clearer demonstration, let them pretend that the hash codes of the three lines are foo=8 , bar=16 and spam=32 . So:

 int hashC = 8 ^ 16 ^ 32; 

or break it into binary representation:

 8 ^ 16 ^ 32 == 56; // 8 = 00001000 // ^ // 16 = 00010000 // ^ // 32 = 00100000 // = // 56 00111000 

Now you should understand why the order of the elements in the list is ignored by this implementation, i.e. 8^16^32 = 16^8^32 = 32^16^8 etc.

Secondly, the problem with duplicates. Even if you assume that having the same content in a different sequence is ok (this is not an approach I would recommend), I don't think anyone will argue that the behavior below is desirable. Try options with duplicates in each list.

 var a = new []{ "foo", "bar", "spam" }; var b = new []{ "foo", "bar", "spam", "foo" }; var c = new []{ "foo", "bar", "spam", "foo", "foo" }; var d = new []{ "foo", "bar", "spam", "foo", "foo", "spam", "foo", "spam", "foo" }; 

While a and b generate different hashes of seceuence, GetSequenceHashCode assumes that a , c and d all the same. What for?

If you are an XOR number with yourself, you essentially cancel it, i.e.

 8 ^ 8 == 0; // 8 = 00001000 // ^ // 8 = 00001000 // = // 0 = 00000000 

XOR with the same number again gives the original result, i.e.

 8 ^ 8 ^ 8 == 8; // 8 = 00001000 // ^ // 8 = 00001000 // ^ // 8 = 00001000 // = // 8 = 00001000 

So, if we look at a and c again, replacing the simplified hash codes:

 var a = new []{ 8, 16, 32 }; var c = new []{ 8, 16, 32, 8, 8 }; 

hash codes are reset as:

 int hashA = 8 ^ 16 ^ 32; // = 56 int hashC = 8 ^ 16 ^ 32 ^ 8 ^ 8; // = 56 // ↑ ↑ // these two cancel each other out 

as well as with d , where each pair of foo and spam is canceled.

+17
source

Should the hash be representative of the contents of the list? In other words, will you use a hash to determine potential equality? If not, just create a new Guid and use it.

If the identifier should represent the contents of the list, you can either generate a hash code based on the contents of the list (this will be inefficient, since you cannot cache this value because the contents of the list may change) or discard the entire hash and use Enumerable.SequenceEquals for definitions of equality.


Here is an example of how I will implement the hash code for List<T> . First of all, if you are going to get a hash code for a specific object, you really need to make sure that the object does not change. If this object changes, your hash code is no longer suitable.

The best way to work with a list, which can be "frozen" (which means that no items are added or deleted after a certain point), you need to call AsReadOnly . This will give you ReadOnlyCollection<T> . The implementation below depends on ReadOnlyCollection<T> just to be safe, so remember the following:

 using System; using System.Collections.Generic; using System.Collections.ObjectModel; using System.Linq; class Example { static void Main() { var seqOne = new List<int> { 1, 2, 3, 4, 5, 6 }; var seqTwo = new List<int> { 6, 5, 4, 3, 2, 1 }; var seqOneCode = seqOne.AsReadOnly().GetSequenceHashCode(); var seqTwoCode = seqTwo.AsReadOnly().GetSequenceHashCode(); Console.WriteLine(seqOneCode == seqTwoCode); } } static class Extensions { public static int GetSequenceHashCode<T>(this ReadOnlyCollection<T> sequence) { return sequence .Select(item => item.GetHashCode()) .Aggregate((total, nextCode) => total ^ nextCode); } } 

Oh, one more thing - make sure your MyRichObject type has a good implementation of GetHashCode , otherwise your hash code for the list will potentially give a lot of false positives when comparing.

+2
source

All Articles