String and 4-byte Unicode characters

Question

String and 4-byte Unicode characters

I have one question about strings and characters in C #. I found that a string in C # is a Unicode string, and char takes 2 bytes. Thus, each char is in UTF-16 encoding. This is great, but I also read on Wikipedia that there are some characters that take 4 bytes in UTF-16.

I am making a program that allows you to draw characters for alphanumeric displays. The program also has a tester in which you can write some line, and it draws it for you to see how it looks.

So, how should I work with strings where the user writes a character that takes 4 bytes, i.e. 2 characters. Since I need to pass char to char through a string, find that char in the list and draw it in the panel.

+4

string c # unicode astral-plane

Arxeiss Dec 23 '12 at 11:53

source share

2 answers

Work completely with String objects; Do not use Char at all. Example using IndexOf :

 var needle = "ℬ"; // U+1D49D (I think) var hayStack = "a code point outside basic multi lingual plane: ℬ"; var index = heyStack.IndexOf(needle);

Most methods of the String class have overloads that accept Char or String . Most methods on Char have overrides that also use String . Just do not use Char .

0

ligos Dec 23 '12 at 12:05

source share

Esailija · Accepted Answer · 2012-12-23T11:57:07+0000

You could do:

for( int i = 0; i < str.Length; ++i ) { int codePoint = Char.ConvertToUTF32( str, i ); if( codePoint > 0xffff ) { i++; } }

codePoint then represents any possible code point as a 32-bit integer.

String and 4-byte Unicode characters

More articles: