What does .NET String.Normalize do?

Question

What does .NET String.Normalize do?

The MSDN article on String.Normalize is simple:

Returns a new line whose binary representation is in a specific Unicode normalization form.

And sometimes referring to the "Unicode C. normalization form."

I'm just wondering what that means? How is this feature useful in real life situations?

+53

string .net

GeReV Jul 20 '10 at 8:17

source share

4 answers

One of the differences between form C and form D is how letters with accents are represented: form C uses one code example with a letter with an accent, and form D divides this into letter and accent.

A side effect is that it makes it easy to create a “remove accents” method.

public static string RemoveAccents(string input) { return new string( input .Normalize(System.Text.NormalizationForm.FormD) .ToCharArray() .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark) .ToArray()); // the normalization to FormD splits accented letters in accents+letters // the rest removes those accents (and other non-spacing characters) }

+44

Hans Kesting Jul 20 '10 at 8:25

source share

In Unicode, a character (arranged) can have either a unique code point or a sequence of code points consisting of a base character and its accents.

Wikipedia lists, by way of example, Vietnamese B (U + 1EBF) and its decomposed sequence U + 0065 (e) U + 0302 (circumflex accent) U + 0301 (acute accent).

string.Normalize () converts between 4 normal forms, a string can be encoded in Unicode.

+6

devio Jul 20 '10 at 8:33

source share

This link has a good explanation:

http://unicode.org/reports/tr15/#Norm_Forms

From what I can assume, it can compare two Unicode strings for equality.

+5

Adam Houldsworth Jul 20 '10 at 8:22

source share

Oded · Accepted Answer · 2010-07-20 08:22

It ensures that unicode strings can be compared for equality (even if they use different encodings in Unicode encoding).

From Unicode Standard Appendix No. 15 :

Essentially, Unicode Normalization Algorithm puts all combinations of labels in a given order and uses decomposition and composition rules to convert each line into one of the Unicode normalization forms. Then a binary comparison of the converted strings will determine equivalence.

What does .NET String.Normalize do?

More articles: