How to measure the correct size of non-ASCII characters?

In the following program, I am trying to measure the length of a string with non-ASCII characters.

But I'm not sure why it size()doesn't print the correct length when using non-ASCII characters.

#include <iostream>
#include <string>

int main()
{
    std::string s1 = "Hello";
    std::string s2 = "इंडिया"; // non-ASCII string
    std::cout << "Size of " << s1 << " is " << s1.size() << std::endl;
    std::cout << "Size of " << s2 << " is " << s2.size() << std::endl;
}

Output:

Size of Hello is 5
Size of इंडिया is 18

Live demo Wandbox .

+6
source share
2 answers

I used the std :: wstring_convert class and got the correct string length.

#include <string>
#include <iostream>
#include <codecvt>

int main()
{
    std::string s1 = "Hello";
    std::string s2 = "इंडिया"; // non-ASCII string
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cn;
    auto sz = cn.from_bytes(s2).size();
    std::cout << "Size of " << s2 << " is " << sz << std::endl;
}

Live demo wandbox .

Help link link here for more information onstd::wstring_convert

+1
source

std::string::size , . UNICODE, . , std::wstring::size, ( , : UTF-16, , , ).

( ), , (, , ) . , , UTF-8 ( ++ 17).

UTF-8 - ( ):

int utf8_length(const std::string& s) {
  int len = 0;
  for (auto c : s)
      len += (c & 0xc0) != 0x80;
  return len;
}
+4

All Articles