Why is variable1 + = variable2 much faster than variable1 = variable1 + variable2?

Question

Why is variable1 + = variable2 much faster than variable1 = variable1 + variable2?

I inherited some Python code that is used to create huge tables (up to 19 columns with a width of 5000 rows). It took nine seconds for the table drawn on the screen. I noticed that each row was added using this code:

sTable = sTable + '\n' + GetRow()

where sTable is a string.

I changed this to:

 sTable += '\n' + GetRow()

and I noticed that the table now appeared in six seconds .

And then I changed it to:

 sTable += '\n%s' % GetRow()

based on these Python performance tips (another six seconds).

Since this was called about 5,000 times, this indicated a performance issue. But why was there such a big difference? And why did not the compiler identify the problem in the first version and optimize it?

+52

performance python string html python-internals

Wikis Aug 26 '14 at 10:36 on

source share

1 answer

Martijn Pieters · Accepted Answer · 2014-08-26 10:37

This is not about using inplace += versus + binary add. You did not tell us the whole story. Your original version merged 3 lines, not just two:

 sTable = sTable + '\n' + sRow # simplified, sRow is a function call

Python is trying to help and optimize string concatenation; both when using strobj += otherstrobj and strobj = strobj + otherstringobj , but it cannot apply this optimization when more than 2 lines are involved.

Python strings are usually immutable, but if there are no other references to the object with the left string and it bounces anyway, Python tricks and modifies the string. This avoids the need to create a new line at each join, which can lead to a significant improvement in speed.

This is implemented in a bytecode evaluation loop. And when using BINARY_ADD for two lines , and when using INPLACE_ADD for two lines , Python delegates concatenation to the special helper function string_concatenate() . To be able to optimize concatenation by modifying a string, you first need to make sure that there are no other references to it in the string; if only the stack and the source variable refer to this, this can be done, and the next operation will replace the source reference to the variable.

So, if there are only 2 links to a string, and the next statement is one of STORE_FAST (set the local variable), STORE_DEREF (set the variable referenced by private functions) or STORE_NAME (set the global variable), and the affected variable currently refers to the same line, then this target variable is cleared to reduce the number of links by only 1 stack.

And that is why your source code was not able to fully use this optimization. The first part of your expression is sTable + '\n' , and the next operation is another BINARY_ADD :

 >>> import dis >>> dis.dis(compile(r"sTable = sTable + '\n' + sRow", '<stdin>', 'exec')) 1 0 LOAD_NAME 0 (sTable) 3 LOAD_CONST 0 ('\n') 6 BINARY_ADD 7 LOAD_NAME 1 (sRow) 10 BINARY_ADD 11 STORE_NAME 0 (sTable) 14 LOAD_CONST 1 (None) 17 RETURN_VALUE

The first BINARY_ADD followed by LOAD_NAME to access the sRow variable, not the storage operation. This first BINARY_ADD should always result in a new string object when sTable grows more and more time is required to create this new string object.

You changed this code to:

 sTable += '\n%s' % sRow

which removed the second concatenation . Now bytecode:

 >>> dis.dis(compile(r"sTable += '\n%s' % sRow", '<stdin>', 'exec')) 1 0 LOAD_NAME 0 (sTable) 3 LOAD_CONST 0 ('\n%s') 6 LOAD_NAME 1 (sRow) 9 BINARY_MODULO 10 INPLACE_ADD 11 STORE_NAME 0 (sTable) 14 LOAD_CONST 1 (None) 17 RETURN_VALUE

and all we have left is INPLACE_ADD , followed by the store. sTable can now be changed in place without causing a larger new string object to appear.

You would have the same speed difference:

 sTable = sTable + ('\n%s' % sRow)

here.

A temporary test shows the difference:

 >>> import random >>> from timeit import timeit >>> testlist = [''.join([chr(random.randint(48, 127)) for _ in range(random.randrange(10, 30))]) for _ in range(1000)] >>> def str_threevalue_concat(lst): ... res = '' ... for elem in lst: ... res = res + '\n' + elem ... >>> def str_twovalue_concat(lst): ... res = '' ... for elem in lst: ... res = res + ('\n%s' % elem) ... >>> timeit('f(l)', 'from __main__ import testlist as l, str_threevalue_concat as f', number=10000) 6.196403980255127 >>> timeit('f(l)', 'from __main__ import testlist as l, str_twovalue_concat as f', number=10000) 2.3599119186401367

The moral of this story is that you should not use string concatenation in the first place. The correct way to create a new line from many other lines is to use a list and then use str.join() :

 table_rows = [] for something in something_else: table_rows += ['\n', GetRow()] sTable = ''.join(table_rows)

It's faster:

 >>> def str_join_concat(lst): ... res = ''.join(['\n%s' % elem for elem in lst]) ... >>> timeit('f(l)', 'from __main__ import testlist as l, str_join_concat as f', number=10000) 1.7978830337524414

but you cannot use only '\n'.join(lst) :

 >>> timeit('f(l)', 'from __main__ import testlist as l, nl_join_concat as f', number=10000) 0.23735499382019043

Why is variable1 + = variable2 much faster than variable1 = variable1 + variable2?

More articles: