Tuple () in GenExp and ListComp

Question

Tuple () in GenExp and ListComp

I have a list of some (few) elements, for example:

my_list = [1,2,3,4,5,6,7,8,9,10]

and I have a set of indices, for example:

 indexes = (1,5,9)

I need a tuple of values from a list, for example:

 tuple(my_list[x] for x in indexes)

but it turns out to be pretty slow (when it starts many times).

The index tuple does not change for every list that I run it on - is there a faster way?

I am using Python 2.5 and still get these unexpected results:

 python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "tuple(l[i] for i in indexes)" 100000 loops, best of 3: 3.02 usec per loop python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "tuple([l[i] for i in indexes])" 1000000 loops, best of 3: 0.707 usec per loop

Is this an anomaly, or is this list comprehension really much better than a generator expression?

+6

performance python list tuples

xorsyst Apr 25 '14 at 15:50

source share

3 answers

A tuple is an invariable sequence, so when it is created (and its memory allocated), it needs to know how many elements it will contain in the first place. This means that when creating a tuple from a generator expression, the generator must first be iterated, but because the generators can be consumed only once, and the elements need to be stored somewhere. So what happens can be compared with this:

 tuple(list(generator))

Now creating a list from a generator expression is slower than creating a list using list comprehension, so you can save time just by creating a list using list comprehension.

And if you have no real reason to use a tuple, i.e. if you do not require immutability, you can also save the list and not convert it to a tuple to save a little more time.

Finally, no, in fact there is no better way to iterate over indexes and query the sequence for each. Even if the indices are the same all the time, they still need to be evaluated for each list, so you still have to repeat it.

However, you can save some more time if these indexes are actually fixed. Since simple (l[1], l[5], l[9]) will be much faster than anything else;)

Here are some links from the source (3.4 is used here, but it should look like 2.x):

Creating a tuple using the built-in tuple() function is done in the PySequence_Tuple function.

If the argument is a list, then Python will handle this explicitly by calling PyList_AsTuple , which essentially selects a tuple of list length and then simply copies all the elements.

Otherwise, it will create an iterator from the argument and first try to guess the length. Since generators have no length, Python will use the default assumption of 10 and highlight the tuple of this note length, which for your tuple, we have allocated 7 spaces too much. He then iterates over the iterator and assigns each value to its place in the tuple. Subsequently, it will resize the created tuple .

Now the real difference most likely works in list display methods. List enumeration is essentially a sequence of low-level add-on lists. Thus, this works in the same way that tuples are populated in PySequence_Tuple , described above. Thus, both methods will be equal. However, the difference with the generator expressions is that they have overhead that actually creates the generator (output sequence) that needs to be sorted out. Thus, all this is superfluous, which you avoid when you have an understanding of the list.

+4

poke Apr 25 '14 at 15:59

source share

Another option, although slower than delnan's, uses __getitem__ in a cinjunction with map . However, even with import-statement, the delnan version is faster.

 In [36]: %timeit tuple(map(my_list.__getitem__,indexes)) 1000000 loops, best of 3: 653 ns per loop In [38]: %timeit itemgetter(*indexes)(my_list) 1000000 loops, best of 3: 292 ns per loop

Without ipython:

 python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "tuple(map(l.__getitem__,indexes))" 1000000 loops, best of 3: 0.645 usec per loop python -m timeit -s "import operator" "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "operator.itemgetter(*indexes)(l)" 1000000 loops, best of 3: 0.463 usec per loop

Converting to a tuple seems to make the map variant slower than the itemgetter variant:

 python -m timeit -s "indexes = (1,5,9); l = [1,2,3,4,5,6,7,8,9,10]" "map(l.__getitem__,indexes)" 1000000 loops, best of 3: 0.489 usec per loop

0

dorvak Apr 25 '14 at 16:28

source share

delnan · Accepted Answer · 2014-04-25T15:56:52+0000

operator.itemgetter (do you really need to use 2.5? Is he dead and buried.)

Besides being simpler, it should also be a little faster due to the fact that it is implemented in C. You can create an itemgetter once when you know which indexes you want, and then re-name it in many lists. It is still necessary to copy N elements and create a tuple every time, but it should do it as fast as possible.

Tuple () in GenExp and ListComp

More articles: