Do variables in a function called by __init__ still declare to use a dictionary for key exchange?

I tried to always declare class attributes inside __init__ for clarity and organizational reasons. I recently learned that strict adherence to this practice has additional unaesthetic benefits, thanks to the addition of

3.3. In particular, if all attributes are defined in __init__ , then objects can reduce space by sharing their keys and hashes.

My question is, does key separation of objects occur when attributes are declared in a function called by __init__ ?

Here is an example:

 class Dog: def __init__(self): self.height = 5 self.weight = 25 class Cat: def __init__(self): self.set_shape() def set_shape(self): self.height = 2 self.weight = 10 

In this case, all instances of Dog will share the height and weight keys. Cat instances also have height and weight keys (between themselves, not with Dog , of course).

As an aside, how would you test this?

Notice that Brandon Rhodes said about the key exchange in his Even Faster Dictionary :

If you add one key that is not part of the prototype key set, you lose the general access to the keys

+7
python dictionary python-internals
source share
2 answers

Does object key separation occur when attributes are declared in a function called by __init__ ?

Yes, no matter where you set the attributes, provided that after initialization both have the same set of keys, instance dictionaries use a dictionary implementation with a shared key. Both cases presented have a reduced amount of memory.

You can verify this using sys.getsizeof to grab the size of the instance dictionary, and then compare it with a similar dict created from it. dict.__sizeof__ implementation distinguishes based on this to return different sizes:

 # on 64bit version of Python 3.6.1 print(sys.getsizeof(vars(c))) 112 print(getsizeof(dict(vars(c)))) 240 

So, to find out, all you have to do is compare them.

As for your editing:

"If you add one key that is not in the prototype key set, you will lose key sharing

That's right, this is one of two things that I have discovered (currently) that violate the use of the shared key:

  • Using a non-row key in a dict instance. This can only be done stupidly. (You can do this using vars(inst).update )
  • The contents of the dictionaries of two instances of the same class are rejected; this can be done by changing the instance dictionaries. (one key is not added to this in the prototype key set)

    I'm not sure if this happens when adding one key, this is a detail of the implementation, which may change. (Addendum: see Martijn comments)

For a proper discussion of this question, see Q & A, which I did here: Why are __dict__ instances so small in Python 3?

Both of these things will force CPython to use a "normal" dictionary instead. This, of course, is an implementation detail that you cannot rely on. You may or may not find it in other Python implementations and future versions of CPython.

+6
source share

I think you mean the following paragraph of PEP (in the section section of the dictionary section ):

When resizing a split dictionary, it is converted to a combo table. If the resizing occurs as a result of saving the attribute of the instance, and there is only an instance of the class, then the dictionary will immediately crack. Since most OO codes will set attributes in the __init__ method, all attributes will be set before the second instance is created, and you will no longer need to resize, since all subsequent dictionaries will have the correct size.

Thus, the dictionary keys will remain shared, no matter what additions are made , before a second instance can be created . Doing this in __init__ is the most logical way to achieve this.

This means that it does not mean that attributes set later are not shared; they can still be shared between examples; until you compile any of the dictionaries. Therefore, after creating the second instance, the keys cease to be shared if one of the following events occurs:

  • a new attribute resizes the dictionary
  • the new attribute is not a string attribute (dictionaries are highly optimized for the general case with all-keys-are-strings keys).
  • the attribute is inserted in a different order; for example a.foo = None and b.bar = None , here b.bar has an incompatible insertion order, since the shared dictionary has foo .
  • attribute is deleted. This kills sharing even for one instance . Do not delete attributes if you like generic dictionaries.

So, at the moment when you have two instances (and two dictionaries for exchanging keys), the keys will not be re-shared, but until you run any of the above cases, your instances will continue to share the keys.

It also means that delegating installation attributes to a helper method called from __init__ will not affect the scenario described above, these attributes are still set before creating a second instance. In the end, __init__ will not be able to return before the second method returns.

In other words, you should not worry too much about where you set your attributes. Setting them in the __init__ method makes it easier for you to combine scripts, but any attribute set before creating the second instance is guaranteed to be part of the shared keys.

How to check this: look at the memory size using sys.getsizeof() function ; if creating a copy of the __dict__ matching __dict__ in a larger object, the __dict__ table was split:

 import sys def shared(instance): return sys.getsizeof(vars(instance)) < sys.getsizeof(dict(vars(instance))) 

Quick demo:

 >>> class Foo: ... pass ... >>> a, b = Foo(), Foo() # two instances >>> shared(a), shared(b) # they both share the keys (True, True) >>> a.bar = 'baz' # adding a single key >>> shared(a), shared(b) # no change, the keys are still shared! (True, True) >>> a.spam, a.ham, a.monty, a.eric = ( ... 'eggs', 'eggs and spam', 'python', ... 'idle') # more keys still >>> shared(a), shared(b) # no change, the keys are still shared! (True, True) >>> a.holy, a.bunny, a.life = ( ... 'grail', 'of caerbannog', ... 'of brian') # more keys, resize time >>> shared(a), shared(b) # oops, we killed it (False, False) 

Only when the threshold is reached (for an empty dictionary with 8 spare slots, resizing occurs when the 6th key is added), the dictionaries lost their general property.

Dictionaries change when they are 2/3, and the size as a whole doubles the size of the table. Thus, the next size will take place when adding the 11th key, then at 22, then 43, etc. So, for a large copy of the dictionary you have much more respite.

+5
source share

All Articles