>> b = "xx" >>> a.__hash__() == b.__hash__(...">

Strange result in python

Can someone explain this strange result to me in python 2.6.6?

>>> a = "xx" >>> b = "xx" >>> a.__hash__() == b.__hash__() True >>> a is b True # ok.. was just to be sure >>> a = "x" * 2 >>> b = "x" * 2 >>> a.__hash__() == b.__hash__() True >>> a is b True # yeah.. looks ok so far ! >>> n = 2 >>> a = "x" * n >>> b = "x" * n >>> a.__hash__() == b.__hash__() True # still okay.. >>> a is b False # hey! What the F... ? 
+6
python string identity
source share
2 answers

To understand this, you need to understand several different things.

  • a is b returns true if a and b are the same object, and not just if they have the same value. Strings may have the same value, but be a different instance of this value.
  • When you say a = "x" , what you are actually doing is creating the string constant "x" and then giving it a name, a . String constants are strings that are literally written to code and not programmed. String constants are always interned, which means they are stored in a table for reuse: if you say a = "a"; b = "a" a = "a"; b = "a" , this is actually the same as saying a = "a"; b = a a = "a"; b = a because they will use the same interned string "a" . Therefore, the first a is b is True.
  • When you say a = "x" * 2 , the Python compiler actually optimizes this. It computes the string at compile time - it generates code as if you wrote a = "xx" . So the resulting string "xx' interned. Therefore, the second a is b is true.
  • When you say a = "x" * n , the Python compiler does not know what n is at compile time. Therefore, he forcibly displayed the string "x" , and then performed the string multiplication at run time. Since this is done at runtime, and "x" interned, the resulting string "xx" is not. As a result, each of these lines represents different instances of "xx" , so the final a is b is False.

You yourself can see the difference:

 def a1(): a = "x" def a2(): a = "x" * 2 def a3(): n = 2 a = "x" * n import dis print "a1:" dis.dis(a1) print "a2:" dis.dis(a2) print "a3:" dis.dis(a3) 

In CPython 2.6.4, this outputs:

 a1: 4 0 LOAD_CONST 1 ('x') 3 STORE_FAST 0 (a) 6 LOAD_CONST 0 (None) 9 RETURN_VALUE a2: 6 0 LOAD_CONST 3 ('xx') 3 STORE_FAST 0 (a) 6 LOAD_CONST 0 (None) 9 RETURN_VALUE a3: 8 0 LOAD_CONST 1 (2) 3 STORE_FAST 0 (n) 9 6 LOAD_CONST 2 ('x') 9 LOAD_FAST 0 (n) 12 BINARY_MULTIPLY 13 STORE_FAST 1 (a) 16 LOAD_CONST 0 (None) 19 RETURN_VALUE 

Finally, note that you can say a = intern(a); b = intern(b) a = intern(a); b = intern(b) to create interned versions if strings that guarantee a is b true. However, if you want to check for equality of strings, just use a == b .

+12
source share

The is statement tells whether two variables point to the same object in memory . This is rarely useful and often confused with the == operator, which tells you that two objects "look the same."

This is especially confusing when used with things like short string literals, because the Python compiler puts them in for efficiency. In other words, when you write "xx" , the compiler (emits bytecode) creates one string object in memory and forces all the "xx" literals to point to it. This explains why your first two comparisons are True. Please note that you can get the identifier of strings by calling id on them, which (at least on CPython, probably), their address in memory:

 >>> a = "xx" >>> b = "xx" >>> id(a) 38646080 >>> id(b) 38646080 >>> a is b True >>> a = "x"*10000 >>> b = "x"*10000 >>> id(a) 38938560 >>> id(b) 38993504 >>> a is b False 

The third is that the compiler did not intern the lines a and b for any reason (perhaps because it is not smart enough to notice that the variable n defined once, and then never changed).

You can actually get Python to work inside strings, well, ask for it . This will give you the opportunity to increase productivity and may help. This is probably useless.

Moral: do not use is with string literals. Or int literals. Or anywhere, this really doesn’t mean.

+17
source share

All Articles