Why is “without” sorted after “hello” and not earlier?

I see very strange sorting behavior using CaseInsensitiveComparer.DefaultInvariant. Words that begin with a leading hyphen "-" end with a sort, as if the hyphen was not there, but not sorted before the actual letters, which happens with other punctuation.

So, given {"hello", ".net", "-less"}, I get {".net", "hello", "-less"} instead of the expected {"-less", ".net", " hello "}.

Or, formulated as a test case:

[TestMethod] public void TestMethod1() { var rg = new String[] { "x", "z", "y", "-less", ".net", "- more", "a", "b" }; Array.Sort(rg, CaseInsensitiveComparer.DefaultInvariant); Assert.AreEqual( "- more,-less,.net,a,b,x,y,z", String.Join(",", rg) ); } 

... which looks like this:

 Assert.AreEqual failed. Expected:<- more,-less,.net,a,b,x,y,z>. Actual: <- more,.net,a,b,-less,x,y,z>. 

Any ideas what is going on?

Edit:

It seems like, by default, .NET comes up with things when sorting strings that cause leading hyphens to sort in weird places so that co-ops and co-ops sort together. So, if you want your leading hyphen words to end and the beginning with a different punctuation, you should say this not so:

 Array.Sort(rg, (a, b) => String.CompareOrdinal(a, b)); 
+6
sorting c #
source share
4 answers

Comparison procedures use CultureInfo.InvariantCulture to determine the sort order and casing rules. String comparisons may have different results depending on the culture. For more information on culture-related mappings, see the System.Globalization namespace and coding and localization. From here.

The interesting part:

Word sorting performs culturally sensitive string comparisons in which certain non-Alphanumeric Unicode characters may have special weights assigned to them. For example, a hyphen (-) can have very little weight assigned to it so that the chicken coop and co-op are displayed next to each other in a sorted list. From here.

+11
source share

To sort the rows as you need, you need to create a comparison class that compares the rows using the Compareinfo class . This class allows you to specify different comparison methods: the one that best suits your needs is OrdinalIgnoreCase.

From MSDN:

Ignored Search Values

Comparison operations, such as those performed by IndexOf or LastIndexOf Methods, may produce unexpected results if the search value is ignored. The search value is ignored if it is an empty string (""), a character or a string consisting of characters that have code points that are not taken into account due to comparison of parameters or value with code points that have no linguistic value. If the search value for the IndexOf method is an empty string, for example, the return value is zero.

Note
Whenever possible, the application should use string comparison methods that take the value CompareOptions to indicate the type of comparison expected. As a general rule, user comparisons are best served as linguistic options (using the current culture), while security comparisons should specify Ordinal or OrdinalIgnoreCase.specify Ordinal or OrdinalIgnoreCase.

I modified your test case and it performed correctly:

 public class MyComparer:Comparer<string> { private readonly CompareInfo compareInfo; public MyComparer() { compareInfo = CompareInfo.GetCompareInfo(CultureInfo.InvariantCulture.Name); } public override int Compare(string x, string y) { return compareInfo.Compare(x, y, CompareOptions.OrdinalIgnoreCase); } } public class Class1 { [Test] public void TestMethod1() { var rg = new String[] { "x", "z", "y", "-less", ".net", "- more", "a", "b" }; Array.Sort(rg, new MyComparer()); Assert.AreEqual( "- more,-less,.net,a,b,x,y,z", String.Join(",", rg) ); } } 
+3
source share

My assumption would be that the dash immediately before the letter is ignored for sorting purposes. When you sort a list of words, you want “interethnic” and “international” to be next to each other, right? On the other hand, a dash is considered significant.

+2
source share

The sort order depends on the culture, so you cannot assume that characters will be sorted in ASCII order.

http://msdn.microsoft.com/en-us/library/a7zyyk0c.aspx

In your example, “h” (U + 0048) is before the dash (U + 2013), so “hello” will appear before “-less”. "(U + 002E) in front of both, so" .net "first appears.

0
source share

All Articles