String and 4-byte Unicode characters

I have one question about strings and characters in C #. I found that a string in C # is a Unicode string, and char takes 2 bytes. Thus, each char is in UTF-16 encoding. This is great, but I also read on Wikipedia that there are some characters that take 4 bytes in UTF-16.

I am making a program that allows you to draw characters for alphanumeric displays. The program also has a tester in which you can write some line, and it draws it for you to see how it looks.

So, how should I work with strings where the user writes a character that takes 4 bytes, i.e. 2 characters. Since I need to pass char to char through a string, find that char in the list and draw it in the panel.

+4
source share
2 answers

You could do:

for( int i = 0; i < str.Length; ++i ) { int codePoint = Char.ConvertToUTF32( str, i ); if( codePoint > 0xffff ) { i++; } } 

codePoint then represents any possible code point as a 32-bit integer.

+4
source

Work completely with String objects; Do not use Char at all. Example using IndexOf :

 var needle = "ℬ"; // U+1D49D (I think) var hayStack = "a code point outside basic multi lingual plane: ℬ"; var index = heyStack.IndexOf(needle); 

Most methods of the String class have overloads that accept Char or String . Most methods on Char have overrides that also use String . Just do not use Char .

0
source

All Articles