Strange behavior when comparing unicode objects with string objects

when comparing two strings in python, it works fine and when comparing a string object with a unicode object it fails as expected, however when comparing a string object with a transformed unicode object (unicode --> str) it fails

Demo:

Works as expected:

 >>> if 's' is 's': print "Hurrah!" ... Hurrah! 

Pretty much yes:

 >>> if 's' is u's': print "Hurrah!" ... 

Not expected:

 >>> if 's' is str(u's'): print "Hurrah!" ... 

Why doesn't the third example work as expected when both types are of the same class?

 >>> type('s') <type 'str'> >>> type(str(u's')) <type 'str'> 
+6
source share
3 answers

Do not use is for this, use == . You are comparing whether objects have the same identifiers, and not whether they are the same. Of course, if they are the same object, they will be equal ( == ), but if they are equal, they are not necessarily the same object.

The fact that the first one works is a detail of the implementation of CPython. Small strings, since they are immutable, can be interned by the interpreter. Each time you put the string "s" in your source code, Cpython reuses the same object. however, apparently str("s") returns a new line with the same value. This is not surprising.


Perhaps you are asking yourself: "why put the string 's' at all?" This is a reasonable question. In the end, this is a short line. How much memory can multiple copies float in your source? The answer (I think) is due to a dictionary search. Since dicts with strings, since keys are so common in python, you can speed up the hash function / key matching by performing lightning-fast pointer comparisons (decreasing by slower strcmp ) when pointer comparison returns false.

+12
source

The is operator is used to compare the memory locations of two operands. Since strings are immutable, 's' and 's' occupy the same place in memory.

Due to how unicode is handled in python2.7, u's' and 's' are saved the same / place. Therefore, they occupy the same place in memory. Therefore, 's' is u's' evaluates to True .
As @mgilson points out, 's' and u's' are of different types and therefore do not occupy the same memory location, which leads to an 's' is u's' evaluation of False

However, when calling str(u's') , a new line is created and returned. This new line, because it is recreated, lives in a new place in memory, so the comparison of is fails.

You really want to check that they are equivalent strings, so use ==

 In [1]: 's' == u's' Out[1]: True In [2]: 's' == 's' Out[2]: True In [3]: 's' == str(u's') Out[3]: True 
+3
source

Use == to compare values ​​and is to compare links. If the objects have the same id , it evaluates to True , otherwise with str() , the id changes, so you get False .

+2
source

All Articles