The first step is to write a functor to determine if a given wchar_t is Hindi. This will be (retrieved from) a std::unary_function<wchar_t, bool> . The implementation is trivial: return c>= 0x0900 && c < 0x980; . The second step uses it: std::find_if(begin, end, is_hindi()) .
Since you will need Unicode, you should probably use wchar_t and therefore std::wstring . Neither std::string nor GLib::ustring supports Unicode. In some systems (in particular, Windows), the implementation of wchar_t limited to Unicode 4 = 16 bits, but this should be enough for 99.9% of the world's population.
You will need to convert from / to UTF-8 to I / O, but the advantage of "one character = one wchar_t" is great. For example, std::wstring::substr() will work reasonably. However, you may have problems with "characters" such as U + 094B (DEVANAGARI VOWEL SIGN O). When iterating over std :: wstring, which will be displayed as a character in itself, instead of a modifier. This is still better than std :: string with UTF-8, where you end up repeating on single bytes of U + 094B. And to take only your original examples, none of the bytes in UTF8(U+094B) are reserved for Hindi.
Msalters
source share