Memory List Size

I was just experimenting with the size of python data structures in memory. I wrote the following snippet:

import sys lst1=[] lst1.append(1) lst2=[1] print(sys.getsizeof(lst1), sys.getsizeof(lst2)) 

I checked the code in the following configurations:

  • Windows 7 64bit, Python3.1: output: 52 40 , so lst1 has 52 bytes and lst2 has 40 bytes.
  • Ubuntu 11.4 32bit with Python3.2: output 48 32
  • Ubuntu 11.4 32bit Python2.7: 48 36

Can someone explain to me why the two sizes are different, although both are lists containing 1?

In the python documentation for the getizeof function, I found the following: ...adds an additional garbage collector overhead if the object is managed by the garbage collector. Could this be in my small example?

+40
python
Aug 30 '11 at 17:29
source share
3 answers

Here's a more complete interactive session that will help me explain what is happening (Python 2.6 on 32-bit Windows XP, but that doesn't matter):

 >>> import sys >>> sys.getsizeof([]) 36 >>> sys.getsizeof([1]) 40 >>> lst = [] >>> lst.append(1) >>> sys.getsizeof(lst) 52 >>> 

Note that the empty list is slightly smaller than the one with [1] . However, when an element is added, it becomes much larger.

The reason for this is the implementation details in Objects/listobject.c , in the CPython source.

Empty list

When an empty list [] , space for elements is not allocated - this can be seen in PyList_New . 36 bytes is the amount of space needed for the list data structure itself on a 32-bit machine.

Single item list

When creating a list with one element [1] , space for one element is allocated in addition to the memory required by the list structure itself. Again, this can be found in PyList_New . Given size as an argument, it computes:

 nbytes = size * sizeof(PyObject *); 

And then has:

 if (size <= 0) op->ob_item = NULL; else { op->ob_item = (PyObject **) PyMem_MALLOC(nbytes); if (op->ob_item == NULL) { Py_DECREF(op); return PyErr_NoMemory(); } memset(op->ob_item, 0, nbytes); } Py_SIZE(op) = size; op->allocated = size; 

So, we see that with size = 1 space is allocated for one pointer. 4 bytes (in my 32-bit field).

Add to an empty list

When calling append on an empty list, this is what happens:

  • PyList_Append calls app1
  • app1 asks for the size of the list (and gets 0 as an answer)
  • app1 then calls list_resize with size+1 (1 in our case)
  • list_resize has an interesting distribution strategy summarized in this comment from its source.

There he is:

 /* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... */ new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6); /* check for integer overflow */ if (new_allocated > PY_SIZE_MAX - newsize) { PyErr_NoMemory(); return -1; } else { new_allocated += newsize; } 

Let's do some math

Let's see how the numbers that I quoted in the session at the beginning of my article are achieved.

Thus, 36 bytes is the size required by the list data structure itself on a 32-bit basis. With one element, space is allocated for one pointer, so 4 extra bytes are a total of 40 bytes. OK so far.

When app1 is called on an empty list, it calls list_resize with size=1 . According to the list_resize redistribution list_resize , the next largest available size after 1 is 4, so 4 pointers will be highlighted. 4 * 4 = 16 bytes and 36 + 16 = 52.

In fact, everything makes sense :-)

+80
Aug 30 '11 at 17:49
source share

Sorry, the previous comment was a bit harsh.

what happens is that you look at how the lists are distributed (and I think maybe you just wanted to see how big things were - in this case use sys.getsizeof() )

when something is added to the list, one of two things can happen:

  • additional element is placed in the spare space

  • additional space is required, so a new list is created and the contents copied into it, and the added thing added.

since (2) is expensive (copying things, even pointers, takes time proportional to the number of things that need to be copied, so it grows as the lists become large), we want to do this infrequently. so instead of just adding a little more space, we’ll add a whole piece. usually the size of the amount to be added is similar to what is already in use. Thus, mathematics finds out that the average cost of allocating memory, distributed over many uses, is proportional only to the size of the list.

therefore, what you see is related to this behavior. I don’t know the exact details, but I won’t be surprised if [] or [1] (or both) are special cases in which only enough memory is allocated (to save memory in these general cases), and then adding does "capture a new piece "described above, which adds more.

but I don’t know the exact details - this is exactly how dynamic arrays work. the exact implementation of lists in python will be finely tuned so that it is optimal for typical python programs. so all I really say is that you cannot trust the size of the list to indicate exactly how much it contains - it can contain extra space, and the amount of extra free space is hard to judge or predict.

ps A neat alternative to this is to create lists in the form of pairs (value, pointer) , where each pointer points to the next tuple. this way you can increase the lists gradually, although the total used memory is higher. this is a linked list (what python uses is more like a vector or a dynamic array).

[update] see Eli great answer. s / he explains that both [] and [1] stand out exactly, but adding to [] highlights an extra piece. a comment in the code is what I say above (this is called "over-allocation" and the quantity is proportional to what we have so that the average ("amortized") value is proportional to size.)

+8
Aug 30 '11 at 17:44
source share

Here's a quick demonstration of the list growth template. Changing the third argument to range () will change the output so that it does not look like comments in listobject.c, but the result is when just adding one element seems completely accurate.

 allocated = 0 for newsize in range(0,100,1): if (allocated < newsize): new_allocated = (newsize >> 3) + (3 if newsize < 9 else 6) allocated = newsize + new_allocated; print newsize, allocated 
+2
Aug 01 '13 at 21:11
source share



All Articles