How to handle multibyte string in Python

In PHP, there are multibyte string functions for processing a multibyte string (for example: CJK script). For example, I want to calculate how many letters in a multibyte string are used with a function lenin python, but it returns an inaccurate result (i.e. the number of bytes in this string)

japanese = "桜の花びらたち"
print japanese
print len(japanese)#return 21 instead of 7

Is there any package or function like mb_strlen in PHP?

+5
source share
2 answers

Use Unicode Strings :

# Encoding: UTF-8

japanese = u"桜の花びらたち"
print japanese
print len(japanese)

Pay attention to ubefore the line.

To convert bytestring to Unicode, use decode:"桜の花びらたち".decode('utf-8')

+9
source

unicode:

print len(japanese.decode("utf-8"))

7. utf-8, 21 .

+2

All Articles