To unpack items

Once, after looking at Mike Muller's performance optimization tutorial (I think this one ), one thought began to live in my head: if performance is important, minimize access to items in a loop by index, e. d. if you need to access x[1] several times in the for x in l loop - assign the variable x[1] and reuse it in the loop.

Now I have this synthetic example:

 import timeit SEQUENCE = zip(range(1000), range(1, 1001)) def no_unpacking(): return [item[0] + item[1] for item in SEQUENCE] def unpacking(): return [a + b for a, b in SEQUENCE] print timeit.Timer('no_unpacking()', 'from __main__ import no_unpacking').timeit(10000) print timeit.Timer('unpacking()', 'from __main__ import unpacking').timeit(10000) 

unpacking() and no_unpacking() return the same result. The implementation is different: unpacking() unpacks the elements in a and b in a loop; no_unpacking() gets index values.

For python27, it shows:

 1.25280499458 0.946601867676 

In other words, unpacking() superior to no_unpacking() by ~ 25%.

The question arises:

  • Why is index access slowing more slowly (even in this simple case)?

Bonus question:

  • I also tried this on pypy - there is almost no difference between the two functions in terms of performance. Why is this?

Thanks for any help.

+7
performance python for-loop iterable-unpacking
source share
1 answer

To answer your questions, we can check the bytecode generated by two functions using the dis module:

 In [5]: def no_unpacking(): ...: s = [] ...: for item in SEQUENCE: ...: s.append(item[0] + item[1]) ...: return s ...: ...: ...: def unpacking(): ...: s = [] ...: for a,b in SEQUENCE: ...: s.append(a+b) ...: return s 

I expanded the list view because in python3 it would be more cumbersome to check interesting bytecodes. The code is equivalent, so for our purpose this does not really matter.

Bytecode for the first function:

 In [6]: dis.dis(no_unpacking) 2 0 BUILD_LIST 0 3 STORE_FAST 0 (s) 3 6 SETUP_LOOP 39 (to 48) 9 LOAD_GLOBAL 0 (SEQUENCE) 12 GET_ITER >> 13 FOR_ITER 31 (to 47) 16 STORE_FAST 1 (item) 4 19 LOAD_FAST 0 (s) 22 LOAD_ATTR 1 (append) 25 LOAD_FAST 1 (item) 28 LOAD_CONST 1 (0) 31 BINARY_SUBSCR 32 LOAD_FAST 1 (item) 35 LOAD_CONST 2 (1) 38 BINARY_SUBSCR 39 BINARY_ADD 40 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 43 POP_TOP 44 JUMP_ABSOLUTE 13 >> 47 POP_BLOCK 5 >> 48 LOAD_FAST 0 (s) 51 RETURN_VALUE 

Note that the loop must call BINARY_SUBSCR twice in order to access the two elements of the tuple.

Bytecode for the second function:

 In [7]: dis.dis(unpacking) 9 0 BUILD_LIST 0 3 STORE_FAST 0 (s) 10 6 SETUP_LOOP 37 (to 46) 9 LOAD_GLOBAL 0 (SEQUENCE) 12 GET_ITER >> 13 FOR_ITER 29 (to 45) 16 UNPACK_SEQUENCE 2 19 STORE_FAST 1 (a) 22 STORE_FAST 2 (b) 11 25 LOAD_FAST 0 (s) 28 LOAD_ATTR 1 (append) 31 LOAD_FAST 1 (a) 34 LOAD_FAST 2 (b) 37 BINARY_ADD 38 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 41 POP_TOP 42 JUMP_ABSOLUTE 13 >> 45 POP_BLOCK 12 >> 46 LOAD_FAST 0 (s) 49 RETURN_VALUE 

Note that there is no BINARY_SUBSCR to execute.

So, it seems that UNPACK_SEQUENCE plus one STORE_FAST (which are additional operations added by unpacking) is faster than two BINARY_SUBSCR . This is reasonable because BINARY_SUBSCR is a full-blown method call, and UNPACK_SEQUENCE and STORE_FAST are simpler operations.

You can see the difference even in simpler cases:

 In [1]: def iter_with_index(s): ...: for i in range(len(s)): ...: s[i] ...: In [2]: def iter_without_index(s): ...: for el in s:el ...: In [3]: %%timeit s = 'a' * 10000 ...: iter_with_index(s) ...: 1000 loops, best of 3: 583 us per loop In [4]: %%timeit s = 'a' * 10000 ...: iter_without_index(s) ...: 1000 loops, best of 3: 206 us per loop 

As you can see, row iteration is about 3 times slower using explicit indexing. This is all the cost of calls to BINARY_SUBSCR .

As for your second question: pypy has a JIT that is able to parse the code and creates an optimized version that avoids the overhead of indexing operations. When he realizes that the subscription is done by tuples, it is probably able to create code that does not call the tuple method, but accesses the elements directly, thereby completely removing the BINARY_SUBSCR operation.

+22
source share

All Articles