Python element performance is too slow

I ran into an unusual performance issue when adding a str class member in python 2.7.3. I know that accessing local variables is faster, however, in the problem below, the difference between the two loops exceeds 100 times. The one that accesses a.accum_ starts quickly, but slows down as if str iadd is O (n ^ 2) with a length of str.

Does anyone know the reason?

# Fast ( < 1sec): accum = str() for ii in range(1000000): if (ii % 10000) == 0: print 'fast cnt', ii accum += 'zzzzz\n' # Much slower ( > 5 mins): class Foo: pass a = Foo() a.accum_ = str() for ii in range(1000000): if (ii % 10000) == 0: print 'slow cnt', ii a.accum_ += 'zzzzz\n' 
+8
performance python string
source share
2 answers

In the first example, it is pretty clear that this is a case of optimizing a single link (in fact, there are two links: one of the object itself and one LOAD_FAST ; unicode_concatenate will try to reduce it to 1 before passing PyUnicode_Append control) made by CPython using this unicode_modifiable function:

 static int unicode_modifiable(PyObject *unicode) { assert(_PyUnicode_CHECK(unicode)); if (Py_REFCNT(unicode) != 1) return 0; if (_PyUnicode_HASH(unicode) != -1) return 0; if (PyUnicode_CHECK_INTERNED(unicode)) return 0; if (!PyUnicode_CheckExact(unicode)) return 0; #ifdef Py_DEBUG /* singleton refcount is greater than 1 */ assert(!unicode_is_singleton(unicode)); #endif return 1; } 

But in the second case, the instance data is stored in a Python dict , and not in a simple variable, so things are slightly different.

 a.accum_ += 'foo' 

really requires pre-fetching the value of a.accum_ and storing it on the stack. So now the line has at least three references: one from the instance dictionary, one from DUP_TOP and one from PyObject_GetAttr used by LOAD_ATTR . Therefore, Python cannot optimize this case, since changing one of them in place will affect other links.

 >>> class A: pass ... >>> a = A() >>> def func(): a.str = 'spam' print a.str return '_from_func' ... >>> a.str = 'foo' >>> a.str += func() spam 

You expect the output here to be 'spam_from_func' , but it will be different because the original value of a.str was saved by Python before func() was called.

 >>> a.str 'foo_from_func' 

Bytecode:

 >>> import dis >>> def func_class(): a = Foo() a.accum = '' a.accum += 'zzzzz\n' ... >>> dis.dis(func_class) 2 0 LOAD_GLOBAL 0 (Foo) 3 CALL_FUNCTION 0 (0 positional, 0 keyword pair) 6 STORE_FAST 0 (a) 3 9 LOAD_CONST 1 ('') 12 LOAD_FAST 0 (a) 15 STORE_ATTR 1 (accum) 4 18 LOAD_FAST 0 (a) 21 DUP_TOP 22 LOAD_ATTR 1 (accum) 25 LOAD_CONST 2 ('zzzzz\n') 28 INPLACE_ADD 29 ROT_TWO 30 STORE_ATTR 1 (accum) 33 LOAD_CONST 0 (None) 36 RETURN_VALUE 

Please note that this optimization was performed in about 2004 (CPython 2.4) so ​​that users are not slowness a += b or a = a + b , therefore it is mainly intended for simple variables and only works if the following instructions STORE_FAST (local variable), STORE_DEREF (closures) and STORE_NAME . This is not a general solution, the best way to do it in Python is to create a list and combine its elements using str.join .

CPython implementation details . If s and t are both strings, some Python implementations, such as CPython, can usually perform in-place optimizations for s = s + t or s += t assignments. when applicable this optimization makes quadratic run time much less likely. This optimization is both a version and an implementation dependent. For performance-sensitive code, it is preferable to use str.join() , which provides consistent linear concatenation of performance across versions and implementations.

+5
source share

Python strings are immutable and therefore cannot have the __iadd__ method. What you see in the first example is micro-optimization of the CPython interpreter. In the first example, the interpreter noticed that it has a local variable that has a reference count of 1. Thus, the interpreter can cheekily leave with a line change in place. Despite the fact that this violates the str contract, it will not be obvious under any circumstances that the contract was briefly violated during program execution.

In the last example, this micro-optimization is not performed, so it is so slow. It seems that optimization can be applied, so I'm not sure why it is not applied.

In general, if you are building a string, match the substrings in the list and then use str.join to create the final product.

+3
source share

All Articles