Two apparently identical Python Unicode UTF8 encoded strings do not match

>>> str1 = unicode('María','utf8') >>> str2 = u'María'.encode('utf8') >>> str1 == str2 False 

How is this possible?

Just in case, this is true, I use the iPython Notebook.

+1
python unicode utf-8
source share
2 answers

You have a unicode string and a byte string. This is not the same thing.

One is Unicode, María . The other contains the UTF-8 encoding in bytes, 'Mar\xc3\xada' .

Python 2 does an implicit conversion when comparing Unicode string values ​​and bytes, but you should not rely on this conversion and is completely dependent on the standard codec installed for your system.

If you still don’t know what Unicode is, or why UTF-8 is not the same, or want to know something else about encodings, see:

+8
source share

The string cannot be either "Unicode" or "UTF-8 encoded" ; they are mutually exclusive. Consequently, different lines.

+3
source share

All Articles