How to check Unicode character value range in C ++

I am new to C ++ and come from a non-CS background. Therefore, kindly excuse me if this question is stupid or the answer has been given earlier.

I have a string in C ++, the language is Telugu.

std::string str = "ఉంది"; // (it means exists; pronounced as Vundi)
std::string substring = str.substr(0,3);

The above substring will be "ఉ" (pronounced Vu), and its hexadecimal value in unicode is 0C09.

How can I get the value 0C09 from a substring? The goal is to check if the substring is in the valid range for Telugu (0C00-0C7F).

I saw other questions that they apply for obj-c, java, php, C #, etc. I am looking specifically for C ++ using std :: string.

According to the comment, I read the article at joelonsoftware.com/articles/Unicode.html .

Let me clarify my question with more information. I am using Fedora 19 x86_64 and the encoding is UTF-8. The console can correctly display text.

According to this article, if I understand correctly, ASCII is a single-byte character, and unicode is a multi-byte character. The above code example reflects that here, for each Unicode character, the length is 3 bytes. Besides the fact that we are talking about UTF-8 encoding and text encodings and multibyte characters, this article does not provide practical help in detecting the Unicode string language.

Maybe I should rephrase my question:

How can I define the language for a unicode string in C ++?

Thanks in advance for your help.

+4
source share
3 answers

, ,

std::string str = "ఉంది"; // (it means exists; pronounced as Vundi)
unsigned short i =str[0];
printf("%x %d",i,i);

"ffeo 65504"

wstring i.e

std::wstring str = L"ఉంది"; // (it means exists; pronounced as Vundi)
unsigned short i =str[0];
printf("%x %d",i,i);

"c09 3081" , , , . , , . .

+1

ICU, UTF-8 UTF-16/32 , . . UTF-8.

ICU , , . .

std::string UTF-8 UTF-16/32, substr .

0

You need to convert from your encoding (possibly utf8) (char *) to extended char (wchar_t).

You can see this post or this one for more information about this conversion.

0
source

All Articles