Why is the size of an empty voice recorder the same as a non-empty dict in Python?

It may be trivial, but I'm not sure that I understand, I tried to look for hiking, but did not find a convincing answer.

>>> sys.getsizeof({}) 140 >>> sys.getsizeof({'Hello':'World'}) 140 >>> >>> yet_another_dict = {} >>> for i in xrange(5000): yet_another_dict[i] = i**2 >>> >>> sys.getsizeof(yet_another_dict) 98444 

How do I understand that? Why is the empty tag the same size as the empty text?

+7
python dictionary memory
source share
2 answers

There are two reasons for this:

  • The dictionary contains only links to objects, not the objects themselves, so the size does not correlate with the size of the objects that it contains, but with the number of links (elements) that the dictionary contains.

  • More importantly, the dictionary predetermines memory for links in pieces. Therefore, when you created the dictionary, it already preallocates memory for the first n links. When it fills the memory, it pre-allocates a new fragment.

You can observe this behavior by executing the following world of code.

 d = {} size = sys.getsizeof(d) print size i = 0 j = 0 while i < 3: d[j] = j j += 1 new_size = sys.getsizeof(d) if size != new_size: print new_size size = new_size i += 1 

What prints:

 280 1048 3352 12568 

On my machine, but it depends on the architecture (32 bit, 64 bit).

+9
source share

Dictionaries in CPython allocate a small amount of key space directly in the dictionary object itself (4-8 entries, depending on version and compilation options). From dictobject.h :

 /* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are * allocated directly in the dict object (in the ma_smalltable member). * It must be a power of 2, and at least 4. 8 allows dicts with no more * than 5 active entries to live in ma_smalltable (and so avoid an * additional malloc); instrumentation suggested this suffices for the * majority of dicts (consisting mostly of usually-small instance dicts and * usually-small dicts created to pass keyword arguments). */ #ifndef Py_LIMITED_API #define PyDict_MINSIZE 8 

Note that CPython also resizes the dictionary in batches to avoid frequent redistributions for growing dictionaries. From dictobject.c :

 /* If we added a key, we can safely resize. Otherwise just return! * If fill >= 2/3 size, adjust size. Normally, this doubles or * quaduples the size, but it also possible for the dict to shrink * (if ma_fill is much larger than ma_used, meaning a lot of dict * keys have been * deleted). * * Quadrupling the size improves average dictionary sparseness * (reducing collisions) at the cost of some memory and iteration * speed (which loops over every possible entry). It also halves * the number of expensive resize operations in a growing dictionary. * * Very large dictionaries (over 50K items) use doubling instead. * This may help applications with severe memory constraints. */ if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2)) return 0; return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used); 
+7
source share

All Articles