I need to convert a unicode string to Unicode characters.
for example: Tamil language
"கமலி" => 'க', 'ம', 'லி'
I can break unicode bytes, but creating Unicode characters has become a problem.
byte[] stringBytes = Encoding.Unicode.GetBytes("கமலி"); char[] stringChars = Encoding.Unicode.GetChars(stringBytes); foreach (var crt in stringChars) { Trace.WriteLine(crt); }
It gives the result as:
'க' => 0 x0b95
'ம' => 0 x0bae
'ல' => 0 x0bb2
'ி' => 0 x0bbf
therefore, the problem is how to strip the character "லி" as "லி" without separation, like 'ல', 'ி'.
since this is natural in Indian, representing consonants and vowels as separate characters, but C # parsing is difficult.
All I need to break into 3 characters.
source share