Python: increment marshal / pickle object?

I have a large object that I would like to serialize to disk. I find marshal working well and good and fast.

Right now I am creating my large object and then calling marshal.dump. I would like to avoid placing a large object in memory, if possible - I would like to discard it gradually when I create it. Is it possible?

The object is pretty simple, a dictionary of arrays.

+4
source share
4 answers

The bsddb functions 'hashopen' and 'btopen' provide a consistent dictionary-like interface. Perhaps you could use one of them instead of the usual dictionary to gradually serialize arrays to disk?

import bsddb import marshal db = bsddb.hashopen('file.db') db['array1'] = marshal.dumps(array1) db['array2'] = marshal.dumps(array2) ... db.close() 

To get arrays:

 db = bsddb.hashopen('file.db') array1 = marshal.loads(db['array1']) ... 
+4
source

All your object should do is a list dictionary, then you can use the shelve module . It is a dictionary-like interface in which keys and values ​​are stored in a database file, not in memory. One limitation that may or may not affect you is that the keys in the Shelf objects must be strings. Storage value will be more efficient if you specify protocol = -1 when creating a Shelf object so that it uses a more efficient binary representation.

+4
source

It really depends on how you build the object. Is this an array of helper objects? You can marshal / saw each element of the array when you create it. Is this a dictionary? The same idea applies (marshal / pickle keys)

If this is just a big complex harry object, you may want to unload every part of the object, and then apply what your building has ever been when you read it.

0
source

You should be able to dump an element in parts to a file. Two design issues that require subsidence:

  • How do you build an object when you put it in memory?
  • How do you need data when it leaves memory?

If your build process fills the entire array associated with the given key at a time, you can simply reset the key: array in the file as a separate dictionary:

 big_hairy_dictionary['sample_key'] = pre_existing_array marshal.dump({'sample_key':big_hairy_dictionary['sample_key']},'central_file') 

Then, when updating, each call to marshal.load ('central_file') will return a dictionary that you can use to update the central dictionary. But it really will be useful only if you need to return data, you want to process the reading of "central_file" once per key.

Alternatively, if you populate the elements of an array with elements in a specific order, try:

 big_hairy_dictionary['sample_key'].append(single_element) marshal.dump(single_element,'marshaled_files/'+'sample_key') 

Then, when you download it back, you don’t have to build the entire dictionary to return what you need; you simply call marshal.load ('marshaled_files / sample_key') until you return None, and you have everything associated with the key.

0
source

All Articles