I am new to C ++ and come from a non-CS background. Therefore, kindly excuse me if this question is stupid or the answer has been given earlier.
I have a string in C ++, the language is Telugu.
std::string str = "ఉంది";
std::string substring = str.substr(0,3);
The above substring will be "ఉ" (pronounced Vu), and its hexadecimal value in unicode is 0C09.
How can I get the value 0C09 from a substring? The goal is to check if the substring is in the valid range for Telugu (0C00-0C7F).
I saw other questions that they apply for obj-c, java, php, C #, etc. I am looking specifically for C ++ using std :: string.
According to the comment, I read the article at joelonsoftware.com/articles/Unicode.html .
Let me clarify my question with more information. I am using Fedora 19 x86_64 and the encoding is UTF-8. The console can correctly display text.
According to this article, if I understand correctly, ASCII is a single-byte character, and unicode is a multi-byte character. The above code example reflects that here, for each Unicode character, the length is 3 bytes. Besides the fact that we are talking about UTF-8 encoding and text encodings and multibyte characters, this article does not provide practical help in detecting the Unicode string language.
Maybe I should rephrase my question:
How can I define the language for a unicode string in C ++?
Thanks in advance for your help.