Python efficiency: lists against tuples

I have an average number of base objects.

These base objects will be placed in collections, and these collections will be scattered around: sorted, truncated, etc.

Unfortunately, n is large enough so that the memory consumption is a little alarming, and the speed becomes relevant.

I understand that tuples are slightly more memory efficient as they are deduplicated.

In any case, I would like to know what compilations of cpu / memory lists or tuples are in Python 2.6 / 2.7.

+8
optimization python
source share
7 answers

If you have a tuple and a list with the same elements, the tuple takes up less space. Since tuples are immutable, you cannot sort them, add to them, etc. I recommend watching this talk from Alex Gaynor for a quick introduction on when to choose which data structure in Python.

UPDATE: After thinking about this, you can look at optimizing the use of the space of your objects, for example, __slots__ or using namedtuple as a proxy server instead of real objects. This is likely to result in significantly greater savings, since you have N and (presumably) only a few collections in which they appear. namedtuple in particular is super amazing; look at Raymond Hettinger .

+16
source share

As mentioned above, tuples are unchanged. Sorting a tuple (e.g. sorted(mytuple) ) returns a list that you will need to return back to the tuple.

To sort a tuple (and store it in a tuple), you will need to do this:

 mytuple = (3,2,1) mysortedtuple = tuple(sorted(mytuple)) 

To sort the list, you will need to do this:

 mylist = [3,2,1] mylist.sort() 

Since you are not casting and re-casting, the latter is more efficient in this case.

Don’t get hung up on using tuples over lists unless you have a good excuse. If you need sorted data, tuples are not suitable if they are not created in this way in the first place. Tuples are superior when the data they contain DOES NOT CHANGE, for example, with configuration settings that are loaded at run time, or with data that has already been processed.

Given that you mentioned that you are processing a large data set, you may need to use a functional programming style with generators and iterators over lists and tuples. Thus, you do not move around and create new containers, but simply bind iteration operations to achieve the final result.

Further reading:

+9
source share

What is the (average, minimum, maximum) number of base objects in a collection?

Tuples are "deduplicated", but lists are not? What do you think “deduplicated” means in this context?

Lists take up more memory than tuples, because extra memory is allocated when the presumption is that the list will grow, and you certainly don’t want to rename the memory every time you do big_list.append (). However, on a 32-bit machine, the amortized cost of an additional list item is 4 bytes for the pointer, N bytes for the item itself, and no more than 4 bytes for additional memory. N - 16 bytes for the float. This means that the float list takes up to 24 bytes for each additional float, compared to 20 bytes for a tuple. A “base object” with N == 100 gives a comparison of 108 versus 104. If based on an object are mentioned in two collections, then 58 versus 54. How big is your N?

Tip. Leave your collections as lists. Concentrate on:

  • ensuring that your underlying objects are memory efficient

  • Use generators and itertools goodies instead of temporary lists where possible

  • if you cannot avoid temporary lists, make sure they are immediately thrown away, they are no longer needed, i.e. Do not wait for the creation method to return; use explicit del as soon as possible.

+4
source share

In addition to all these suggestions, you may find that numpy will fill your needs. If your objects are what by default have numeric descriptors (ints, native C types, etc.), then that would be ideal. You can use a numpy array with custom objects, but it can be more than necessary.

+3
source share

You cannot use them equally. Tuples are immutable and do not support adding, sorting, etc. (Calling sorted on a tuple gives a list, etc.). Tuples are completely different from lists, so performance comparison does not make sense.

+2
source share

You cannot sort an immutable object - i.e. When sorting a tuple, you will always create a new one.

+1
source share

There are at least two existing questions that are quite similar to yours that the answers (or the links inside them) may be useful to you. To summarize: let functions of the type (mutable or immutable, heterogeneous or homogeneous), and not performance, determine your decision, since differences in performance / efficiency are minimal.

What is the difference between a list and tuples in Python?
What are the differences between a list, a dictionary, and a tuple in Python?

+1
source share

All Articles