Failure to optimize string sequence in CPython

Question

Why, in CPython,

def add_string(n): s = '' for _ in range(n): s += ' ' 

take linear time but

 def add_string_in_list(n): l = [''] for _ in range(n): l[0] += ' ' 

takes quadratic time?




Evidence:

 Timer(partial(add_string, 1000000)).timeit(1) #>>> 0.1848409200028982 Timer(partial(add_string, 10000000)).timeit(1) #>>> 1.1123797750042286 
 Timer(partial(add_string_in_list, 10000)).timeit(1) #>>> 0.0033865350123960525 Timer(partial(add_string_in_list, 100000)).timeit(1) #>>> 0.25131178900483064 



What i know

CPython has optimizations for adding a line when the line being added has a reference count of 1.

This is because strings in Python are immutable, and therefore they usually cannot be edited. If there are several links for a line and it is mutated, both links will see the changed line. This is clearly not required, so a mutation cannot happen with multiple links.

If there is only one link to a string, however, changing the value will only change the string for that link, which requires changing it. You can check that this is a probable reason:

 from timeit import Timer from functools import partial def add_string_two_references(n): s = '' for _ in range(n): s2 = s s += ' ' Timer(partial(add_string_two_references, 20000)).timeit(1) #>>> 0.032532954995986074 Timer(partial(add_string_two_references, 200000)).timeit(1) #>>> 1.0898985149979126 

I am not sure why the factor is only 30 times instead of the expected 100x, but I believe that this is an overhead.




What I do not know

So why does the list version create two links? Is it even that which hinders optimization?

You can verify that this does not apply to normal objects in different ways:

 class Counter: def __iadd__(self, other): print(sys.getrefcount(self)) s = Counter() s += None #>>> 6 class Counter: def __iadd__(self, other): print(sys.getrefcount(self)) l = [Counter()] l[0] += None #>>> 6 
+8
optimization python string cpython
Jun 04 '14 at 14:28
source share
2 answers

In a list-based approach, a row from index 0 of the list is taken and changed before it is returned to the list with index 0.
For this short translator, it still has an old version of the line in the list and cannot modify the place.
If you look at the source of Python , you will see that there is no support for changing the list item in place. Thus, the object (the string in this case) must be retrieved from the list, modified, and then returned.
In other words, the type list completely support agnostic of type str for the += operator.

And consider the following code:

 l = ['abc', 'def'] def nasty(): global l l[0] = 'ghi' l[1] = 'jkl' return 'mno' l[0] += nasty() 

The value of l is ['abcmno', 'jkl'] , which proves that 'abc' was taken from the list, then nasty() received the completed change to the contents of the list, the lines 'abc' and 'mno' received concatenation and the result was assigned l[0] . If nasty() was evaluated before accessing l[0] to change it in place, then the result would be 'ghimno' .

+8
Jun 04 '14 at 2:52
source share

So why does the list version create two links?

At l[0] += ' ' one link is at l[0] . One link is created temporarily to do += on.

Here are two simpler functions that show the effect:

 >>> def f(): ... l = [''] ... l[0] += ' ' ... >>> def g(): ... s = '' ... s += ' ' ... 

Dismantling them gives

 >>> from dis import dis >>> dis(f) 2 0 LOAD_CONST 1 ('') 3 BUILD_LIST 1 6 STORE_FAST 0 (l) 3 9 LOAD_FAST 0 (l) 12 LOAD_CONST 2 (0) 15 DUP_TOPX 2 18 BINARY_SUBSCR 19 LOAD_CONST 3 (' ') 22 INPLACE_ADD 23 ROT_THREE 24 STORE_SUBSCR 25 LOAD_CONST 0 (None) 28 RETURN_VALUE >>> dis(g) 2 0 LOAD_CONST 1 ('') 3 STORE_FAST 0 (s) 3 6 LOAD_FAST 0 (s) 9 LOAD_CONST 2 (' ') 12 INPLACE_ADD 13 STORE_FAST 0 (s) 16 LOAD_CONST 0 (None) 19 RETURN_VALUE 

In f , the BINARY_SUBSCR (slicing) command puts l[0] at the top of the VM stack. DUP_TOPX duplicates the top n elements in the stack. Both functions (see ceval.c ) increase the reference count; DUP_TOPX ( DUP_TOP_TWO in Py3) does this directly, and BINARY_SUBSCR uses PyObject_GetItem . So, the line reference count is now at least three.

g does not have this problem. It creates one sitelink when an item is clicked using LOAD_FAST , giving a LOAD_FAST of two, the minimum amount for an item in the VM stack, so it can do the optimization.

+6
Jun 04 '14 at 15:07
source share



All Articles