Comparing indices and ordinal strings

Question

Comparing indices and ordinal strings

My problem is that String.IndexOf returns -1 . I expect it to return 0 .

Options:

text = C:\\Users\\User\\Desktop\\Sync\\̼ (note the Seagull Combination symbol below )

stringToTrim = C:\\Users\\User\\Desktop\\Sync\\

When I check the index using int index = text.IndexOf(stringToTrim); , the value of index is -1 . I found that using ordinal string comparison helped solve this problem:

 int index = text.IndexOf(stringToTrim, StringComparison.Ordinal);

Reading online, many Unicode characters (e.g. U + 00B5 and U + 03BC ) are mapped to the same character, so it would be nice to expand this and normalize both lines:

 int index = text.Normalize(NormalizationForm.FormKD).IndexOf(stringToTrim.Normalize(NormalizationForm.FormKD), StringComparison.Ordinal);

Is this the right approach to check which index row contains all consecutive characters of another string? So, the idea is that you normalize when you want to check if the characters match, but you don’t normalize when you want to check the characters for their encoded values (say, duplicate characters)? Also, can someone explain why int index = text.IndexOf(stringToTrim); did not find a match at the beginning of the line? In other words, what is he really doing under the covers? I would expect it to start searching for characters from the beginning of the line to the end of the line.

+8

string c # indexof unicode culture

Alexandru Dec 15 '14 at 20:32

source share

2 answers

Yes, you should use StringComparison.Ordinal to ensure that the culture is ignored when comparing the value. This is necessary for all rows that are considered cultural invariants by default. This includes file paths.

If you are not using StringComparison.Ordinal ), you can introduce subtle errors: http://msdn.microsoft.com/en-us/library/dd465121(v=vs.110).aspx

When culturally independent string data, such as XML tags, HTML tags, user names, file paths and system object names: are interpreted as if they were culture sensitive, the application code may be subject to subtle errors, poor performance and, in some cases, security issues.

Some lateral benefit of StringComparison.Ordinal is better performance: http://msdn.microsoft.com/en-us/library/ms973919.aspx

+1

Piotrwolkowski Dec 15 '14 at 20:48

source share

Peter Duniho · Accepted Answer · 2014-12-15T20:48:07+0000

Behavior makes sense to me. You are using a combination character that combines with the previous character, turning it into another character that will not match the '\\' character at the end of the search string. This prevents the search for the entire string you are looking for. If you were looking for "C:\\Users\\User\\Desktop\\Sync" instead, he would find it.

Using StringComparison.Ordinal tells .NET to ignore the various rules for characters and only look at their exact ordinal value. This is similar to what you wanted, so yes & hellip what you should do.

The “right approach” depends entirely on what behavior you want. Many string manipulations include text that is provided or provided by the user, and must be performed with respect to culture and Unicode. In other cases, this is undesirable. It is important to choose the right approach for your needs.

Comparing indices and ordinal strings

More articles: