Under what circumstances do identical strings have the same link?

I searched for web pages and stack overflow problems but couldn't find the answer to this question. The observation I made is that in Python 2.7.3, if you assigned two variables to the same single character string, like

>>> a = 'a' >>> b = 'a' >>> c = ' ' >>> d = ' ' 

Then the variables will have the same link:

 >>> a is b True >>> c is d True 

This is also true for some longer lines:

 >>> a = 'abc' >>> b = 'abc' >>> a is b True >>> ' ' is ' ' True >>> ' ' * 1 is ' ' * 1 True 

However, there are many cases where the link is (unexpectedly) not used:

 >>> a = 'ac' >>> b = 'ac' >>> a is b False >>> c = ' ' >>> d = ' ' >>> c is d False >>> ' ' * 2 is ' ' * 2 False 

Can someone explain the reason for this?

I suspect there may be simplifications / replacements made by the interpreter and / or some caching mechanism that exploits the fact that python strings are immutable for optimization in some special cases, but what do I know? I tried to make deep copies of the strings using the str constructor and the copy.deepcopy function, but the strings are still incompatible with shared links.

The reason I am having problems with this is because I am checking the inequality of string references in some unit tests that I am writing for new-style python class cloning methods.

+8
python string immutability reference
source share
3 answers

The details of when strings are cached and reused are implementation dependent, vary from Python version to Python version and cannot be relied on. If you want to check strings for equality, use == , not is .

In CPython (the most commonly used Python implementation), the string literals that occur in the source code are always interned, so if the same string literal occurs twice in the source code, they end up pointing to the same string object. Python 2.x, you can also call the built-in intern() function to force the internship of a specific string, but you really shouldn't.

Change with respect to the actual purpose of checking whether attributes between instances are used incorrectly: this type of check is useful only for mutable objects. For attributes of an immutable type, there is no semantic difference between shared and unshared objects. You can exclude immutable types from your tests using

 Immutable = basestring, tuple, numbers.Number, frozenset # ... if not isinstance(x, Immutable): # Exclude types known to be immutable 

Note that this also excludes tuples containing mutable objects. If you want to test them, you will have to go down to the tuples recursively.

+8
source share

In CPython, as an implementation detail, an empty string is common , as well as single-character strings whose code is in the Latin-1 range, you should not depend on this, since this can bypass this function.

You can request a string to be interned using sys.intern ; this will happen automatically in some cases:

Usually, the names used in Python programs are automatically interned, and the dictionaries used to store the attributes of a module, class, or instance have interned keys.

sys.intern displayed so that you can use it (after profiling!) for performance:

Interning strings are useful for getting a little performance when searching dictionaries - if the keys in the dictionary are interned and the search key is interned, key matching (after hashing) can be done by comparing the pointers instead of the compare string.

Note that intern is built-in in Python 2.

+5
source share

I think this is implementation and optimization. If the line is short, they can (and often?) β€œSplit up,” but you can't depend on it. Once you have longer lines, you can see that they do not match.

 In [2]: s1 = 'abc' In [3]: s2 = 'abc' In [4]: s1 is s2 Out[4]: True 

longer lines

 In [5]: s1 = 'abc this is much longer' In [6]: s2 = 'abc this is much longer' In [7]: s1 is s2 Out[7]: False 

use == to compare strings (not the is operator).

-

The OP observation / hypothesis (in the comments below) that this might be related to the number of tokens seems to be supported by the following:

 In [12]: s1 = 'abc' In [13]: s2 = 'abc' In [14]: s1 is s2 Out[14]: False 

compared to the original abc example above.

+4
source share

All Articles