To answer your questions, we can check the bytecode generated by two functions using the dis module:
In [5]: def no_unpacking(): ...: s = [] ...: for item in SEQUENCE: ...: s.append(item[0] + item[1]) ...: return s ...: ...: ...: def unpacking(): ...: s = [] ...: for a,b in SEQUENCE: ...: s.append(a+b) ...: return s
I expanded the list view because in python3 it would be more cumbersome to check interesting bytecodes. The code is equivalent, so for our purpose this does not really matter.
Bytecode for the first function:
In [6]: dis.dis(no_unpacking) 2 0 BUILD_LIST 0 3 STORE_FAST 0 (s) 3 6 SETUP_LOOP 39 (to 48) 9 LOAD_GLOBAL 0 (SEQUENCE) 12 GET_ITER >> 13 FOR_ITER 31 (to 47) 16 STORE_FAST 1 (item) 4 19 LOAD_FAST 0 (s) 22 LOAD_ATTR 1 (append) 25 LOAD_FAST 1 (item) 28 LOAD_CONST 1 (0) 31 BINARY_SUBSCR 32 LOAD_FAST 1 (item) 35 LOAD_CONST 2 (1) 38 BINARY_SUBSCR 39 BINARY_ADD 40 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 43 POP_TOP 44 JUMP_ABSOLUTE 13 >> 47 POP_BLOCK 5 >> 48 LOAD_FAST 0 (s) 51 RETURN_VALUE
Note that the loop must call BINARY_SUBSCR twice in order to access the two elements of the tuple.
Bytecode for the second function:
In [7]: dis.dis(unpacking) 9 0 BUILD_LIST 0 3 STORE_FAST 0 (s) 10 6 SETUP_LOOP 37 (to 46) 9 LOAD_GLOBAL 0 (SEQUENCE) 12 GET_ITER >> 13 FOR_ITER 29 (to 45) 16 UNPACK_SEQUENCE 2 19 STORE_FAST 1 (a) 22 STORE_FAST 2 (b) 11 25 LOAD_FAST 0 (s) 28 LOAD_ATTR 1 (append) 31 LOAD_FAST 1 (a) 34 LOAD_FAST 2 (b) 37 BINARY_ADD 38 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 41 POP_TOP 42 JUMP_ABSOLUTE 13 >> 45 POP_BLOCK 12 >> 46 LOAD_FAST 0 (s) 49 RETURN_VALUE
Note that there is no BINARY_SUBSCR to execute.
So, it seems that UNPACK_SEQUENCE plus one STORE_FAST (which are additional operations added by unpacking) is faster than two BINARY_SUBSCR . This is reasonable because BINARY_SUBSCR is a full-blown method call, and UNPACK_SEQUENCE and STORE_FAST are simpler operations.
You can see the difference even in simpler cases:
In [1]: def iter_with_index(s): ...: for i in range(len(s)): ...: s[i] ...: In [2]: def iter_without_index(s): ...: for el in s:el ...: In [3]: %%timeit s = 'a' * 10000 ...: iter_with_index(s) ...: 1000 loops, best of 3: 583 us per loop In [4]: %%timeit s = 'a' * 10000 ...: iter_without_index(s) ...: 1000 loops, best of 3: 206 us per loop
As you can see, row iteration is about 3 times slower using explicit indexing. This is all the cost of calls to BINARY_SUBSCR .
As for your second question: pypy has a JIT that is able to parse the code and creates an optimized version that avoids the overhead of indexing operations. When he realizes that the subscription is done by tuples, it is probably able to create code that does not call the tuple method, but accesses the elements directly, thereby completely removing the BINARY_SUBSCR operation.