>>> str1 = unicode('María','utf8') >>> str2 = u'María'.encode('utf8') >>> str1 == str2 False
How is this possible?
Just in case, this is true, I use the iPython Notebook.
You have a unicode string and a byte string. This is not the same thing.
One is Unicode, María . The other contains the UTF-8 encoding in bytes, 'Mar\xc3\xada' .
María
'Mar\xc3\xada'
Python 2 does an implicit conversion when comparing Unicode string values and bytes, but you should not rely on this conversion and is completely dependent on the standard codec installed for your system.
If you still don’t know what Unicode is, or why UTF-8 is not the same, or want to know something else about encodings, see:
Absolute Minimum Every software developer Absolutely, positively needs to know about Unicode and character sets (no excuses!) From Joel Spolsky
Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The string cannot be either "Unicode" or "UTF-8 encoded" ; they are mutually exclusive. Consequently, different lines.