What is the best way to compute a class hash code with string properties?

I have a class with string properties, and I need to override the GetHashCode () method.

class A { public string Prop1 { get; set; } public string Prop2 { get; set; } public string Prop3 { get; set; } } 

The first idea is to do something like this:

 public override int GetHashCode() { return Prop1.GetHashCode() ^ Prop2.GetHashCode() ^ Prop3.GetHashCode(); } 

Second idea:

 public override int GetHashCode() { return String.Join(";", new[] {Prop1, Prop2, Prop3}).GetHashCode(); } 

What is the best way?

+4
source share
2 answers

You should not just XOR them together, because it is not an order accounting. Imagine you have two objects:

 "foo", "bar", "baz" 

and

 "bar", "foo", "baz" 

With plain XOR, both of them will have the same hash. Fortunately, this is pretty easy to get around. This is the code I use to combine hashes:

 static int MultiHash(IEnumerable<object> items) { Contract.Requires(items != null); int h = 0; foreach (object item in items) { h = Combine(h, item != null ? item.GetHashCode() : 0); } return h; } static int Combine(int x, int y) { unchecked { // This isn't a particularly strong way to combine hashes, but it's // cheap, respects ordering, and should work for the majority of cases. return (x << 5) + 3 + x ^ y; } } 

There are many ways to combine hashes, but usually it does something very simple. If for some reason this doesnโ€™t work for your situation, MurmurHash has a fairly reliable hash pool, you can pull.

+4
source

Just XOR hashes of each line together. This is cheaper (performance wise) than string concatenation, and as far as I can see, it is not prone to collisions. Suppose each line is 5 characters long and each character takes 1 byte. In the first, you hash 15 bytes into 4 bytes (int). In the second, you concatenate all 3 lines (expensive operation) to get one line of 15 bytes, and you hash them up to 4 bytes. Both convert 15 bytes to 4, so in theory both are pretty similar in terms of collisions.

In reality, there is a little difference in the probability of collisions, but in practice this does not always matter. It depends on the data that the rows will have. If all three lines are equal and each hash has a value of 0001 (I use a prime for example only). If all 3 are equal, then xoring the first two will get you 0000 and xoring the third, and you will return to 0001 . By combining strings, this can be avoided due to some performance (if you are writing a program critical for performance, I would not concatenate strings in the inner loop).

So, in the end, I really didnโ€™t give an answer in the end, for the simple reason that it really isnโ€™t. It all depends on where and how it will be used.

+3
source

All Articles