How can System.String be packaged correctly for case insensitive?

This question does not concern Windows path management; I used this only as a specific case-insensitive string example. (And if I change the example now, a whole bunch of comments will be pointless.)


It might be like Maybe creating a string-insensitive string class? but there is not much discussion. Also, I don't care about the complicated language integration that string uses or the performance optimization of System.String .

Let's say I use a large number of Windows names, which (as a rule, are not case sensitive) (in fact, I do not deal with many details of the actual paths, such as \ vs. / , \\\\ like \ , file:// URL, .. etc.). A simple wrapper could be:

 sealed class WindowsPathname : IEquatable<WindowsPathname> /* TODO: more interfaces from System.String */ { public WindowsPathname(string path) { if (path == null) throw new ArgumentNullException(nameof(path)); Value = path; } public string Value { get; } public override int GetHashCode() { return Value.ToUpperInvariant().GetHashCode(); } public override string ToString() { return Value.ToString(); } public override bool Equals(object obj) { var strObj = obj as string; if (strObj != null) return Equals(new WindowsPathname(strObj)); var other = obj as WindowsPathname; if (other != null) return Equals(other); return false; } public bool Equals(WindowsPathname other) { // A LOT more needs to be done to make Windows pathanames equal. // This is just a specific example of the need for a case-insensitive string return Value.Equals(other.Value, StringComparison.OrdinalIgnoreCase); } } 

Yes, all / most of the interfaces on System.String probably should be implemented; but the foregoing seems sufficient for discussion.

Now I can write:

 var p1 = new WindowsPathname(@"c:\foo.txt"); var p2 = new WindowsPathname(@"C:\FOO.TXT"); bool areEqual = p1.Equals(p2); // true 

This allows me to “talk about” the WindowsPathname in my code, and not about implementation details such as StringComparison.OrdinalIgnoreCase . (Yes, this particular class can also be extended to handle \ vs / , so c: /foo.txt will be equal to C: \ FOO.TXT, but that is not the point of this question.) Also, this class (with additional interfaces ) will be case insensitive when instances are added to collections; no need to specify IEqualityComparer . Finally, a specific class like this also simplifies the prevention of “insensitive” operations, such as comparing the file system path to the registry key.

Question: will such an approach be successful? Are there any serious and / or subtle flaws or other "gotchas"? (Again, this is due to an attempt to configure the string class to be case-insensitive without managing Windows path names.)

+8
string c #
source share
5 answers

I would create an immutable structure containing a string that converts the string in the constructor to the standard case (for example, lowercase). Then you can also add an implicit statement to simplify the creation and override comparison operators. I think this is the easiest way to achieve behavior, plus you get only a little overhead (the conversion only happens in the constructor).

Here is the code:

 public struct CaseInsensitiveString { private readonly string _s; public CaseInsensitiveString(string s) { _s = s.ToLowerInvariant(); } public static implicit operator CaseInsensitiveString(string d) { return new CaseInsensitiveString(d); } public override bool Equals(object obj) { return obj is CaseInsensitiveString && this == (CaseInsensitiveString)obj; } public override int GetHashCode() { return _s.GetHashCode(); } public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y) { return x._s == y._s; } public static bool operator !=(CaseInsensitiveString x, CaseInsensitiveString y) { return !(x == y); } } 

Here is the usage:

 CaseInsensitiveString a = "STRING"; CaseInsensitiveString b = "string"; // a == b --> true 

This also works for collections.

+8
source share

So, you need something that converts a string to an object, and if you convert two strings to two of these objects, you want to be able to compare these objects for equality with your own set of rules for the equality of two objects.

Your example is upper and lower case, but there may also be slashes and backslashes, you might even want to determine that the word "dollar" is $.

Suppose you split a collection of all possible strings in subcollections of strings that you define equal. In this case, "Hello" will be in the same subset as "HELLO" and "hElLO". Perhaps "c: \ temp" will be in the same collection as "c: / TEMP".

If you could find something to identify your subcollection, you could say that all lines belonging to the same subcollection will have the same identifier. Or, in other words: all lines that you define equal will have the same identifier.

If this were possible, then it would be sufficient to compare the identifier of the subset. If two rows have the same subcollection identifier, then they belong to the same subcollection and therefore are considered equal in accordance with our definition of equality.

Let this identifier have a normalized string value . The constructor of your CaseInsensitiveString can convert the input string to a normalized string value. To test two objects for equality, all we need to do is check to see if they have the same normalized value.

An example of string normalization would be:

  • Make a lowercase line
  • do all backslashes
  • convert all words USD to $
  • remove all thousands of separators in numbers without thousands seperator
  • etc., depending on when you want the rows to be equal.

In accordance with the foregoing, the following lines will lead to the same normalized line:

  • White House $ 1,000,000
  • White House $ 1,000,000
  • White House USD 1,000,000

We can define anything as a normalized line if all the lines that we define equal have the same normalized line. A good example would be

  • white house $ 1,000,000

Note. I will not go into details on how to find words such as the separator of dollars and thousands. The importance is that you understand the meaning of the normalized string.

Having said that, the only hard part is finding a stringIdentifier. The rest of the class is pretty simple:

Code to build. The constructor takes a string and defines its subcollection. I also added a default constructor.

 public class CaseInsensitiveString : IEquatable<CaseInsensitiveString> { private string normalized = ""; public CaseInsensitiveString(string str) { this.normalized = Normalize(str); } public CaseInsensitiveString() { this.Normalize = Normalize(null); } } 

Equality: by definition, two objects are the same if they have the same normalized value

See MSDN How to Determine Equivalence of Values ​​for a Type

 public bool Equals (CaseInsensitiveString other) { // false if other null if (other != null) return false; // optimization for same object if (object.ReferenceEquals(this, other) return true; // false if other a different type, for instance subclass if (this.Gettype() != other.Gettype()) return false; // if here: everything the same, compare the stringIdentifier return this.normalized==other.normalized; } 

Please note that this last line is the only code where we do the actual equality check!

All other equality functions use only the Equals function defined above:

 public override bool Equals(object other) { return this.Equals(other as CaseInsensitiveString); } public override int GetHashCode() { return this.Normalized.GetHashCode(); } public static bool operator ==(CaseInsensitiveString x, CaseInsensitiveString y) { if (object.ReferenceEquals(x, null) { // x is null, true if y also null return y==null; } else { // x is not null return x.Equals(y); } } public static bool operator !=(CaseInsensitiveString x, CaseInsensitiveString y) { return !operator==(x, y); } 

So now you can do the following:

 var x = new CaseInsensitiveString("White House $1,000,000"); var y = new CaseInsensitiveString("white house $1000000"); if (x == y) ... 

Now the only thing we need to implement is the Normalize function. Once you know that two lines are considered equal, you know how to normalize.

Suppose two strings are equal if they are case insensitive and the slashes are the same as backslashes. (bad English)

If the normalize function returns the same string in lowercase with all backslashes, then the two strings that we consider equal will have the same normalized value

 private string Normalize(string str) { return str.ToLower().Replace('/', '\'); } 
+3
source share

A shorter and lighter approach might be to create an extension method:

 public static class StringExt { public static bool IsSamePathAs(this string @this, string other) { if (@this == null) return other == null; if (object.ReferenceEquals(@this, other)) return true; // add other checks return @this.Equals(other, StringComparison.OrdinalIgnoreCase); } } 

This requires much less coding than creating a whole separate class, it does not have overhead for performance (possibly even nested), no additional distributions, and it also expresses the intention of IMO:

 var arePathsEqual = @"c:\test.txt".IsSamePathAs(@"C:\TEST.txt"); 
0
source share

Hmm ... I don’t think the string case is the only problem you have. Let me ask you a couple of questions:

Is c:\myPath the same as c:/myPath ? What about file:////c:/myPath ? Or what about \\myMachine\c$\myPath ?

I understand where you are going and what you want to achieve, but it looks like you are looking through the tunnel for a simple problem - why build a framework that does what a simple comparison. .ToLower() vs ToLower() does?

Speaking of which, if your area of ​​problems, in addition to wrapping a string, includes trying to evaluate the absolute equality of two given paths, it makes sense to write a class. But this will require a much more active solution than what you offer ...

NTN!

-one
source share

First Call Spade a Spade
You must determine what is the explicit responsibility of the class.

Or you want the class to manage the Windows Path names, and you cannot throw away all the comments about this because the "code control case" will be merged with the "code control paths". The clutch will then make you unable to check (and ensure proper behavior) the hull without considering the path.

Or you want to inject CaseInvariantString , and then name it correctly (and maybe use it in another class called WindowsPathname).

For inquiries about Class Responbility, Cohesion, Coupling, and other great concepts, I would recommend the following books:

  • Clean Code by Robert C. Martin (Uncle Bob)
  • Code Completed by Steeve McConnell

Secondly, wrapping strings inside a class to check case invariance can be considered as an integer wrapping in the class PositiveInteger. It can (and will be) considered by some as an add-on . This is a general trend from all developers to try and reach the pinnacle of object-oriented dogma. Here it seems to be the practice of wrapping all types of values ​​in a class (for example, int in class ID). However, be sure to ask you questions.

  • What is the cost of adopting such a practice?
  • What are the benefits?
  • What are the difficulties this may cause?
  • Can I take a common approach to all my projects?
  • Do I have an introduction from my technology guide / architect (or similar credentials) that this is good practice?

Finally, as a simple technical point. You should not create a string inside your class . This is bad for performance. In fact, since strings are invariant when you execute ToUpperInvariant() in GetHashCode() , it creates a new String .

And for the sake of path invariance ... This does not work outside of Windows
( For Mono, obviously / foo! = / Foo ).

-one
source share

All Articles