Trademark character length in python 2.x

why

>>> len('™') >>> 3 

in python 2.x?

how can I quickly fix it so that it can be considered as a single character (e.g. Python 3.x?)

+4
source share
1 answer

Your terminal coding is set to UTF8. You count the bytes in the encoded character:

 >>> '™' '\xe2\x84\xa2' >>> len('™') 3 

Use unicode to count characters instead of bytes:

 >>> u'™' u'\u2122' >>> len(u'™') 1 

or decode from terminal encoding:

 >>> import sys >>> '™'.decode(sys.stdin.encoding) u'\u2122' 

In Python 3, strings have unicode values, and the type of Python 2 str renamed to byte (your input is essentially the same as b'™' in Python 3).

You can read in Python and Unicode:

+6
source

All Articles