What structure is the Python object stored in memory?

Let's say I have class A:

class A(object): def __init__(self, x): self.x = x def __str__(self): return self.x 

And I use sys.getsizeof to find out how many bytes of instance A takes:

 >>> sys.getsizeof(A(1)) 64 >>> sys.getsizeof(A('a')) 64 >>> sys.getsizeof(A('aaa')) 64 

As illustrated above in the experiment, the size of object A same regardless of what self.x .

So wondering how python stores an object inside?

+6
python object python-internals
source share
2 answers

It depends on what type of object, and also for Python implementation :-)

In CPython, which is used by most people when they use python , all Python objects are represented by C struct, PyObject . Everything that stores an object really stores PyObject * . The PyObject structure contains minimal information: the type of the object (pointer to another PyObject ) and its reference count (integer ssize_t .) The types defined in C extend this structure with the additional information they need to store in the object itself and sometimes separate additional data separately .

For example, tuples (implemented as PyTupleObject โ€œextendingโ€ the PyObject structure) retain their length and the PyObject pointers that they contain inside the structure (the structure contains an array of length 1 in the definition, but the implementation allocates a memory block of the required size to hold PyTupleObject struct plus exactly as much same elements as the tuple should be stored.) Similarly, strings ( PyStringObject ) retain their length, cached hash value, some string - caching ("interning") and actual char * data. Thus, tuples and rows are single blocks of memory.

On the other hand, lists ( PyListObject ) keep their length, PyObject ** for their data and another ssize_t to keep track of how much space they have allocated for the data. Since Python stores PyObject pointers everywhere, you cannot grow a PyObject structure after it is allocated - this may require moving the structure, which would mean finding all the pointers and updating them. Since the list may be required for growth, it must select data separately from the PyObject structure. Tuples and strings cannot grow, and therefore they do not need it. Dicts ( PyDictObject ) work the same way, although they store the key, value, and cached hash value of the key, not just the elements. Dict also has extra overhead for hosting small dictons and specialized search functions.

But these are all types in C, and you can usually see how much memory they will use just by looking at the source C. Instances of classes defined in Python, not C, are not so simple. The simplest case, examples of classical classes, is not so complicated: it is PyObject , which stores PyObject * for its class (which is not the same as that stored in the PyObject structure), a PyObject * to its __dict__ attribute (which contains all the other attributes of the instance) and PyObject * to its weak list (which is used by the weakref module and initialized if necessary). The __dict__ instance __dict__ usually unique to the instance, so when calculating the "memory size" of such an instance, you usually also want to calculate the size of the dict attribute. But this should not be instance specific! __dict__ can be assigned just fine.

Classes of the new style complicate manners. Unlike classic classes, class instances of the new style are not separate C types, so they donโ€™t need to store the object class separately. They have room for __dict__ and weakreflist links, but unlike classic instances, they do not require the __dict__ attribute for arbitrary attributes. if a class (and all its base classes) use __slots__ to define a strict set of attributes, and none of these attributes is called __dict__ , the instance does not allow arbitrary attributes and does not highlight dict. On the other hand, attributes defined by __slots__ must be stored somewhere. This is done by storing the PyObject pointers for the values โ€‹โ€‹of these attributes directly in the PyObject structure, as is the case with types written in C. Each entry in __slots__ will thus take up PyObject * , regardless of whether the attribute is set or not.

All that is said, the problem remains that, since everything in Python is an object, and everything that contains an object just contains a link, it is sometimes very difficult to draw a line between the objects. Two objects can refer to the same data bit. They can contain only two links to this data. Getting rid of both objects also eliminates data. Do they both own the data? There is only one of them, but if so, which one? Or will you say that they have half the data, although getting rid of one object does not free half the data? Weakrefs can make this even more complicated: two objects can refer to the same data, but deleting one of the objects can cause the other object to also get rid of the link to this data, as a result of which the data will be cleared after all.

Fortunately, the general case is pretty easy to understand. There are memory debugs for Python that do a reasonable job of tracking these things, like heapy . And as long as your class (and its base points) is simple enough, you can get a reasonable guess about how much memory will be needed, especially in large quantities. If you really want to know the exact sizes of your data structures, check out the CPython source; most built-in types are simple structures described in Include/<type>object.h and implemented in Objects/<type>object.c . The PyObject structure itself is described in Include/object.h . Just keep in mind: he points it all out; those occupy the room too.

+22
source share

in the case of a new class instance, getizeof () returns the size of the PyObject reference , which is returned by the C function PyInstance_New ()

if you need a list of all checks on the size of the this object.

+1
source share

All Articles