Is it possible to write a hash function to compare with many?

Question

Is it possible to write a hash function to compare with many?

Is it possible to write a hash function for the following comparison logic?

Two instances of My are equal if at least two properties from (A, B, C) coincide.

Part of the equal is simple, but I'm on the part of the hash code, and part of me thinks that this may not be possible.

 class MyOtherComparer : IEqualityComparer<My> { public bool Equals(My x, My y) { if (Object.ReferenceEquals(x, y)) return true; if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null)) return false; int matches = 0; if(xA == yA) matches++; if(xB == yB) matches++; if(xC == yC) matches++; // match on two out of three return (matches > 1) } // If Equals() returns true for a pair of objects // then GetHashCode() must return the same value for these objects. public int GetHashCode(My x) { // ??? } }

UPDATE . In addition to Reed Kopsey's correct answer, a very important point about the overall usefulness of the fuzzy comparator is clearly indicated by Ethan Brown - see also his answer to a complete understanding of what underlies this Question / Answer.

+7

c # iequalitycomparer

Aaron anodide Jun 04 '12 at 17:45

source share

6 answers

The hash code should be the same for two equal objects, but it should not be different for two different objects. You can return the same value for all objects satisfying IEqualityComparer consumers, but I don’t know how to get any benefit from the hash in your situation.

+1

C.Evenhuis Jun 04 '12 at 17:52

source share

Is it possible to write a hash function for the following comparison logic?

Yes. You can always write a hash code for anything. The question is how effective it is. Regardless, you can always:

 public int GetHashCode() { return 0; }

It will always work, but it is terribly * inefficient *.

+1

Servy Jun 04 '12 at 17:53

source share

Suppose we have 2 objects A, B. Each of them has properties p1, p2 and p3. Assume that A.p1 == B.p1 and A.p3 == B.p3, if the hash function depends on p2, it will be different for A and B, so they are not equal. If you want to calculate a hash function based on p1 and p3, there are many examples, the hash function will not return the correct hash value, and many equal objects will not be equal. I think we cannot have a variable function. You can use a constant, but if you want to use it as a hash key in a dictionary or hash table, you will not get O (1) complexity.

+1

Kamran Amini Jun 04 '12 at 17:56

source share

The main problem with getting a volatile hash function is that you cannot provide equality transitivity. Usually equality is considered transitive. That is, A = B and B = C means that A = C (which also implies that A, B and C will have the same hash code). However, with your definition of equality, you could have A = B, B = C, and A! = C. In the ideal case, unequal elements would have different hash codes, so A and C would have different hash codes; but they cannot, because both are equal to B, so they must have the same hash code.

The only way to get a volatile hash function is to find out about your shared collection. You will need to split the collection into “equality boxes”, where each element in the bin will be equal to some other element in the box (including the possibility of having one bin). After you have done this separation, you can use it to generate a volatile algorithm (provided that you get more than one bean) to generate a hash code.

The fact of the matter is that the idea of equality cells is that there can be many such hopper configurations. As a selection criterion, you can increase the number of mailboxes (to improve the performance of the search in the hash table). The degenerative case (as indicated by Reed Copsie's correct answer) is that you put everything in the same box (although, as the supercards indicate in the comments below, the name "equality boxes" then becomes misleading). This does not violate any hash value restrictions, but it will lead to poor performance in algorithms that expect to have value for creating a non-degenerate split.

As supercards show, in order to satisfy hash value restrictions, the following should be true: if two elements are in two different cells, they should not be equal (however, two elements in the same box should not be equal).

+1

Ethan brown Jun 04 '12 at 18:17

source share

Seeing that your real problem is with the Except extension method, I decided to offer something for you, although this is actually not the answer.

 public class EqualityComparer<T> : IEqualityComparer<T> { private readonly Func<T, T, bool> _comparer; private readonly Func<T, int> _hashCoder; public EqualityComparer(Func<T, T, bool> comparer, Func<T, int> hashCoder = null) { if (comparer == null) { throw new ArgumentNullException("comparer"); } this._comparer = comparer; this._hashCoder = hashCoder ?? (x => 0); } public bool Equals(T x, T y) { return this._comparer(x, y); } public int GetHashCode(T obj) { return this._hashCoder(obj); } }

And then you can use it like this:

 arr1.Except(arr2, new EqualityComparer<dynamic>((x, y) => { if (ReferenceEquals(x, y)) return true; if (ReferenceEquals(x, null) || ReferenceEquals(y, null)) return false; var matches = 0; if (xA == yA) matches++; if (xB == yB) matches++; if (xC == yC) matches++; return (matches > 1); }));

0

Simplevar Jun 04 '12 at 18:21

source share

Reed copsey · Accepted Answer · 2012-06-04T17:52:34+0000

Yes it is possible. The simplest implementation would always be to return a constant.

 public int GetHashCode(My x) { return 0; }

The GetHashCode documentation says:

Implementations are needed to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.

However, you can freely return the same hash code for two objects that are also not equal.

This suggests that this may cause some algorithms to perform very poorly, as you will get a lot of hash collisions. However, given the nature of your odd / unique equality check, this may be required.

Please note that this will be problematic anyway. Given your logic, it is possible to have three objects where comparer.Equals(foo, bar)==true and comparer.Equals(foo, baz)==true , but comparer.Equals(baz, bar)==false . This will probably be problematic in many cases when IEqualityComparer<T> .

Is it possible to write a hash function to compare with many?

More articles: