Why are tuples taking up less memory than lists?

A tuple takes up less memory in Python:

 >>> a = (1,2,3) >>> a.__sizeof__() 48 

whereas list takes up more memory:

 >>> b = [1,2,3] >>> b.__sizeof__() 64 

What happens inside the python memory management system?

+85
python list tuples python-internals
Oct 10 '17 at 10:04 on
source share
4 answers

I assume that you are using CPython with 64 bits as well (I got the same results on my 64-bit CPython 2.7). There may be differences in other Python implementations, or if you have 32-bit Python.

Regardless of implementation, list are variable in size, and tuple is fixed.

So, tuple can store elements directly inside the structure, lists, on the other hand, need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer on 64-bit systems, which are 64-bit, hence 8 bytes.

But there is one more thing list does: they redistribute. Otherwise, list.append will list.append be an O(n) operation, to make it depreciated by O(1) (much faster !!!), which is redistributed. But now it should track the allocated size and the filled size ( tuple needs to store only one size, because the selected and filled size is always identical). This means that each list must store a different "size", which on 64-bit systems is a 64-bit integer, again 8 bytes.

So, list requires at least 16 bytes more memory than tuple s. Why did I say "at least"? Due to redistribution. Redistribution means that it allocates more space than necessary. However, the size of the redistribution depends on how you create the list and the history of adding / removing:

 >>> l = [1,2,3] >>> l.__sizeof__() 64 >>> l.append(4) # triggers re-allocation (with over-allocation), because the original list is full >>> l.__sizeof__() 96 >>> l = [] >>> l.__sizeof__() 40 >>> l.append(1) # re-allocation with over-allocation >>> l.__sizeof__() 72 >>> l.append(2) # no re-alloc >>> l.append(3) # no re-alloc >>> l.__sizeof__() 72 >>> l.append(4) # still has room, so no over-allocation needed (yet) >>> l.__sizeof__() 72 

Images

I decided to create some images to accompany the explanation above. It may be useful

This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free) loops:

enter image description here

This is actually just an approximation, because int objects are also Python objects, and CPython even reuses small integers, so perhaps a more accurate representation (albeit not so readable) of objects in memory would be:

enter image description here

Useful links:

Note that __sizeof__ does not really return the β€œcorrect” size! It returns only the size of the stored values. However, when you use sys.getsizeof , the result is different:

 >>> import sys >>> l = [1,2,3] >>> t = (1, 2, 3) >>> sys.getsizeof(l) 88 >>> sys.getsizeof(t) 72 

There are 24 "extra" bytes. It is real that garbage collector overhead that is not counted in the __sizeof__ method. This is because you generally shouldn't use magic methods directly - use functions that know how to handle them, in this case: sys.getsizeof (which actually adds GC overhead to the value returned from __sizeof__ ) .

+115
Oct 10 '17 at 10:18
source share

I will take a deeper dive into the CPython code base so that we can see how the sizes are calculated. In your specific example, the redistributions were not performed, so I will not touch on this.

I will use 64-bit values ​​here, like you.




The size for list calculated by the following function: list_sizeof :

 static PyObject * list_sizeof(PyListObject *self) { Py_ssize_t res; res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*); return PyInt_FromSsize_t(res); } 

Here Py_TYPE(self) is a macro that captures ob_type of self (returns PyList_Type ), and _PyObject_SIZE is another macro that captures tp_basicsize from this type. tp_basicsize evaluates to sizeof(PyListObject) , where PyListObject is the instance structure.

The PyListObject structure has three fields:

 PyObject_VAR_HEAD # 24 bytes PyObject **ob_item; # 8 bytes Py_ssize_t allocated; # 8 bytes 

They have comments (which I cut back) explaining what they are, following the link above to read them. PyObject_VAR_HEAD expands to three 8 byte fields ( ob_refcount , ob_type and ob_size ), so the tab is 24 byte.

So now res :

 sizeof(PyListObject) + self->allocated * sizeof(void*) 

or

 40 + self->allocated * sizeof(void*) 

If the list instance contains items that are highlighted. the second part calculates their contribution. self->allocated , as the name implies, contains the number of selected elements.

Without any elements, the size of the lists is calculated as:

 >>> [].__sizeof__() 40 

ie the size of the instance structure.




The objects

tuple do not define the tuple_sizeof function. Instead, they use object_sizeof to calculate their size:

 static PyObject * object_sizeof(PyObject *self, PyObject *args) { Py_ssize_t res, isize; res = 0; isize = self->ob_type->tp_itemsize; if (isize > 0) res = Py_SIZE(self) * isize; res += self->ob_type->tp_basicsize; return PyInt_FromSsize_t(res); } 

This, as for list s, captures tp_basicsize , and if the object has a non-zero tp_itemsize (this means that it has instances of variable length), it multiplies the number of elements in the tuple (which it gets through Py_SIZE ) with tp_itemsize .

tp_basicsize uses sizeof(PyTupleObject) again, where PyTupleObject struct contains :

 PyObject_VAR_HEAD # 24 bytes PyObject *ob_item[1]; # 8 bytes 

So, without any elements (i.e. Py_SIZE returns 0 ), the size of the empty tuples is equal to sizeof(PyTupleObject) :

 >>> ().__sizeof__() 24 

but? Well, here is a weirdness that I have not found an explanation, tp_basicsize of tuple actually calculated as follows:

 sizeof(PyTupleObject) - sizeof(PyObject *) 

why the extra 8 bytes are removed from tp_basicsize is something that I could not find out. (See MSeifert Comment for a possible explanation)




But this is basically the difference in your specific example. list also supports a number of highlighted items that help determine when to reassign again.

Now that additional items are being added, lists really do this over-allocation to achieve the addition of O (1). This leads to a larger size, since MSeifert describes his answer well.

+30
Oct 10 '17 at 12:10
source share

MSeifert's answer covers it broadly; to make it simple, you might think:

tuple is unchanged. After installing it, you cannot change it. Therefore, you know in advance how much memory you need to allocate for this object.

list is mutable. You can add or remove items to or from them. He must know its size (for an internal implant). If necessary, the size is changed.

No free meals - these features come with a cost. Consequently, the overhead is in memory for lists.

+27
Oct 10 '17 at 10:32 on
source share

The tuple size has a prefix, that is, when the tuple is initialized, the interpreter allocates sufficient space for the data contained and that its end, giving it immutable (cannot be changed), while the list is a mutable object, therefore, implies dynamic memory allocation, therefore, to avoid allocation space every time you add or change the list (allocate enough space to store the changed data and copy the data to it), it allocates additional space yours for future additions, modifications, ... which pretty much sums it up.

+3
Oct 13 '17 at 13:50
source share



All Articles