In the following program, I am trying to measure the length of a string with non-ASCII characters.
But I'm not sure why it size()doesn't print the correct length when using non-ASCII characters.
size()
#include <iostream> #include <string> int main() { std::string s1 = "Hello"; std::string s2 = "इंडिया"; // non-ASCII string std::cout << "Size of " << s1 << " is " << s1.size() << std::endl; std::cout << "Size of " << s2 << " is " << s2.size() << std::endl; }
Output:
Size of Hello is 5 Size of इंडिया is 18
Live demo Wandbox .
I used the std :: wstring_convert class and got the correct string length.
#include <string> #include <iostream> #include <codecvt> int main() { std::string s1 = "Hello"; std::string s2 = "इंडिया"; // non-ASCII string std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cn; auto sz = cn.from_bytes(s2).size(); std::cout << "Size of " << s2 << " is " << sz << std::endl; }
Live demo wandbox .
Help link link here for more information onstd::wstring_convert
std::wstring_convert
std::string::size , . UNICODE, . , std::wstring::size, ( , : UTF-16, , , ).
std::string::size
std::wstring::size
( ), , (, , ) . , , UTF-8 ( ++ 17).
UTF-8 - ( ):
int utf8_length(const std::string& s) { int len = 0; for (auto c : s) len += (c & 0xc0) != 0x80; return len; }