So, you need something that converts a string to an object, and if you convert two strings to two of these objects, you want to be able to compare these objects for equality with your own set of rules for the equality of two objects.
Your example is upper and lower case, but there may also be slashes and backslashes, you might even want to determine that the word "dollar" is $.
Suppose you split a collection of all possible strings in subcollections of strings that you define equal. In this case, "Hello" will be in the same subset as "HELLO" and "hElLO". Perhaps "c: \ temp" will be in the same collection as "c: / TEMP".
If you could find something to identify your subcollection, you could say that all lines belonging to the same subcollection will have the same identifier. Or, in other words: all lines that you define equal will have the same identifier.
If this were possible, then it would be sufficient to compare the identifier of the subset. If two rows have the same subcollection identifier, then they belong to the same subcollection and therefore are considered equal in accordance with our definition of equality.
Let this identifier have a normalized string value . The constructor of your CaseInsensitiveString can convert the input string to a normalized string value. To test two objects for equality, all we need to do is check to see if they have the same normalized value.
An example of string normalization would be:
- Make a lowercase line
- do all backslashes
- convert all words USD to $
- remove all thousands of separators in numbers without thousands seperator
- etc., depending on when you want the rows to be equal.
In accordance with the foregoing, the following lines will lead to the same normalized line:
- White House $ 1,000,000
- White House $ 1,000,000
- White House USD 1,000,000
We can define anything as a normalized line if all the lines that we define equal have the same normalized line. A good example would be
Note. I will not go into details on how to find words such as the separator of dollars and thousands. The importance is that you understand the meaning of the normalized string.
Having said that, the only hard part is finding a stringIdentifier. The rest of the class is pretty simple:
Code to build. The constructor takes a string and defines its subcollection. I also added a default constructor.
public class CaseInsensitiveString : IEquatable<CaseInsensitiveString> { private string normalized = ""; public CaseInsensitiveString(string str) { this.normalized = Normalize(str); } public CaseInsensitiveString() { this.Normalize = Normalize(null); } }
Equality: by definition, two objects are the same if they have the same normalized value
See MSDN How to Determine Equivalence of Values for a Type
public bool Equals (CaseInsensitiveString other) { // false if other null if (other != null) return false; // optimization for same object if (object.ReferenceEquals(this, other) return true; // false if other a different type, for instance subclass if (this.Gettype() != other.Gettype()) return false; // if here: everything the same, compare the stringIdentifier return this.normalized==other.normalized; }
Please note that this last line is the only code where we do the actual equality check!
All other equality functions use only the Equals function defined above:
public override bool Equals(object other) { return this.Equals(other as CaseInsensitiveString); } public override int GetHashCode() { return this.Normalized.GetHashCode(); } public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y) { if (object.ReferenceEquals(x, null) {
So now you can do the following:
var x = new CaseInsensitiveString("White House $1,000,000"); var y = new CaseInsensitiveString("white house $1000000"); if (x == y) ...
Now the only thing we need to implement is the Normalize function. Once you know that two lines are considered equal, you know how to normalize.
Suppose two strings are equal if they are case insensitive and the slashes are the same as backslashes. (bad English)
If the normalize function returns the same string in lowercase with all backslashes, then the two strings that we consider equal will have the same normalized value
private string Normalize(string str) { return str.ToLower().Replace('/', '\'); }