List () uses more memory than list comprehension

Question

List () uses more memory than list comprehension

So, I played with list objects and found a bit of a strange thing, that if list is created using list() , does it use more memory than list comprehension? I am using Python 3.5.2

 In [1]: import sys In [2]: a = list(range(100)) In [3]: sys.getsizeof(a) Out[3]: 1008 In [4]: b = [i for i in range(100)] In [5]: sys.getsizeof(b) Out[5]: 912 In [6]: type(a) == type(b) Out[6]: True In [7]: a == b Out[7]: True In [8]: sys.getsizeof(list(b)) Out[8]: 1008

From docs :

Lists can be constructed in several ways:
Using a pair of square brackets to indicate an empty list: []
Using square brackets, separating elements with commas: [a] , [a, b, c]
Using list comprehension: [x for x in iterable]
Using type constructor: list() or list(iterable)

But it seems that when using list() it uses more memory.

And the bigger the list bigger, the gap widens.

Why is this happening?

UPDATE # 1

Testing with Python 3.6.0b2:

 Python 3.6.0b2 (default, Oct 11 2016, 11:52:53) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(list(range(100))) 1008 >>> sys.getsizeof([i for i in range(100)]) 912

UPDATE # 2

Test with Python 2.7.12:

 Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(list(xrange(100))) 1016 >>> sys.getsizeof([i for i in xrange(100)]) 920

+78

python list cpython list-comprehension python-internals

vishes_shell Oct 13 '16 at 10:25

source share

2 answers

Thanks to everyone for helping me understand this amazing Python.

I don’t want to ask a question that is massive (that’s why I am posting the answer), I just want to show and share my thoughts.

As @ReutSharabani correctly noted: "list () deterministically determines the size of the list." You can see it on this chart.

When you append or use list comprehension, you always have some kind of boundaries that expand when you reach a point. And with list() you have almost the same borders, but they float.

UPDATE

So thanks to @ReutSharabani , @tavo , @SvenFestersen

To summarize: list() preallocates memory depends on the size of the list, understanding the list cannot do this (it asks for more memory when necessary, for example .append() ). Therefore list() store more memory.

Another graph showing list() predefine memory. Thus, the green line shows list(range(830)) adding the element by element, and the memory does not change.

UPDATE 2

As @Barmar noted in the comments below, list() should get me faster than comprehending the list, so I spent timeit() with number=1000 for list lengths from 4**0 to 4**10 , and the results

+28

vishes_shell Oct 13 '16 at 11:37

source share

Reut Sharabani · Accepted Answer · 2016-10-13 10:40

I think you see redistribution patterns, this is a sample from the source :

 /* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... */ new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

Listing the sizes of lists containing lengths of 0-88, you can see the pattern matching:

 # create comprehensions for sizes 0-88 comprehensions = [sys.getsizeof([1 for _ in range(l)]) for l in range(90)] # only take those that resulted in growth compared to previous length steps = zip(comprehensions, comprehensions[1:]) growths = [x for x in list(enumerate(steps)) if x[1][0] != x[1][1]] # print the results: for growth in growths: print(growth)

Results (format (list length, (old total size, new total size)) ):

 (0, (64, 96)) (4, (96, 128)) (8, (128, 192)) (16, (192, 264)) (25, (264, 344)) (35, (344, 432)) (46, (432, 528)) (58, (528, 640)) (72, (640, 768)) (88, (768, 912))

Redistribution is performed for performance reasons, which allows you to create lists without allocating more memory with each growth (better amortized ).

The likely reason for the difference in using list comprehension is that list comprehension cannot deterministically calculate the size of the generated list, but list() can. This means that understanding will constantly increase the list as it fills it using redundant distribution until it is finally filled.

It is possible that this will not increase the redistribution buffer with unused allocated nodes after its execution (in fact, in most cases it will not, which will lead to the defeat of the redistribution target).

list() , however, can add some buffer regardless of the size of the list, since it knows in advance the size of the final list.

Another supporting evidence, also from the source, is that we see lists calling LIST_APPEND , which indicates the use of list.resize , which, in turn, indicates the consumption of the pre-allocation buffer, not knowing how much of it will be populated . This is consistent with the behavior you see.

In conclusion, list() will pre-allocate more nodes as a function of list size

 >>> sys.getsizeof(list([1,2,3])) 60 >>> sys.getsizeof(list([1,2,3,4])) 64

Understanding the list does not know the size of the list, so it uses the add operations when it grows, draining the pre-allocation buffer:

 # one item before filling pre-allocation buffer completely >>> sys.getsizeof([i for i in [1,2,3]]) 52 # fills pre-allocation buffer completely # note that size did not change, we still have buffered unused nodes >>> sys.getsizeof([i for i in [1,2,3,4]]) 52 # grows pre-allocation buffer >>> sys.getsizeof([i for i in [1,2,3,4,5]]) 68

List () uses more memory than list comprehension

More articles: