String.Empty.StartsWith (((char) 10781) .ToString ()) always returns true?

Question

String.Empty.StartsWith (((char) 10781) .ToString ()) always returns true?

I am trying to process the following character: ⨝ ( http://www.fileformat.info/info/unicode/char/2a1d/index.htm )

If you check if an empty string starting with this character is always true, it makes no sense! Why is this?

// visual studio 2008 hides lines that have this char literally (bug in visual studio?!?) so i wrote it unicode instead. char specialChar = (char)10781; string specialString = specialChar.ToString(); // prints 1 Console.WriteLine(specialString.Length); // prints 10781 Console.WriteLine((int)specialChar); // prints false Console.WriteLine(string.Empty.StartsWith("A")); // both prints true WTF?!? Console.WriteLine(string.Empty.StartsWith(specialString)); Console.WriteLine(string.Empty.StartsWith(((char)10781).ToString()));

+6

string c # .net char unicode

Dxck Dec 12 '09 at 11:23

source share

3 answers

Good unicode glitch; -p

I'm not sure why he does this, but funny:

 Console.WriteLine(string.Empty.StartsWith(specialString)); // true Console.WriteLine(string.Empty.Contains(specialString)); // false Console.WriteLine("abc".StartsWith(specialString)); // true Console.WriteLine("abc".Contains(specialString)); // false

I guess this is a bit like the non-aligned character that Jon mentions in devans ; some string functions see it, and some do not. And if he does not see this, it becomes "does (some string) starts with an empty string", which is always true.

+4

Marc gravell Dec 12 '09 at 11:34

source share

The main reason for this - default string comparisons - is knowing the locale. This means using locale data tables for comparisons (including equality).

Many (if not most) Unicode characters do not matter for many locales and therefore do not exist (or do, but correspond to something or nothing).

See Michael Kaplan’s blog post “Weight Sorting . ” This blog series contains a lot of background information (the APIs are native, but as I understand it, the mechanisms in .NET are the same).

Quick version: this is a difficult area where you can expect correct comparison comparisons (normal language), this leads to odd things with code points for glyphs outside your language.

+4

Richard Dec 12 '09 at 12:53

source share

RichardOD · Accepted Answer · 2009-12-12T11:34:27+0000

You can fix this error using the line number:

In the MSDN docs:

When you specify either StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, the string comparison will be non-linguistic. That is, the functions that are characteristic of a natural language are ignored when creating comparison solutions. This means that decisions are based on a simple byte to compare and ignore the casing or equivalence tables that are parameterized by the culture. As a result, by explicitly setting the parameter to either StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase, your code often gains speed, accuracy grows, and becomes more reliable.

  char specialChar = (char)10781; string specialString = Convert.ToString(specialChar); // prints 1 Console.WriteLine(specialString.Length); // prints 10781 Console.WriteLine((int)specialChar); // prints false Console.WriteLine(string.Empty.StartsWith("A")); // prints false Console.WriteLine(string.Empty.StartsWith(specialString, StringComparison.Ordinal));

String.Empty.StartsWith (((char) 10781) .ToString ()) always returns true?

More articles: