Frequently updating stored data for a numerical experiment using Python

Question

Frequently updating stored data for a numerical experiment using Python

I am doing a numerical experiment that requires a lot of iterations. After each iteration, I would like to save the data in a pickle or pickle-like file in case of a program or data structure failure. What is the best way to proceed. Here is the skeleton code:

data_dict = {}                       # maybe a dictionary is not the best choice
for j in parameters:                 # j = (alpha, beta, gamma) and cycle through
    for k in number_of_experiments:  # lots of experiments (10^4)
        file = open('storage.pkl', 'ab')
        data = experiment()          # experiment returns some numerical value
                                     # experiment takes ~ 1 seconds, but increase
                                     # as parameters scale
        data_dict.setdefault(j, []).append(data)
        pickle.dump(data_dict, file)
        file.close()

Questions:

Is it better to choose the best one here? Or some other python library that I don't know about?
I use a data dict because it is easier to code and more flexible if I need to make a difference, as I do more experiments. Would it be a huge advantage to use a pre-allocated array?
Do opening and closing files help at run time? I do this so that I can check the progress in addition to the text logs that I installed.

!

0

python pickle numerical-methods

Charlie 27 . '14 16:52

2

Shelve, , , ...

klepto joblib. .

joblib klepto . numpy / ... , .

klepto, . klepto , pickle json - .

Python 2.7.7 (default, Jun  2 2014, 01:33:50) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import klepto
>>> data_dict = klepto.archives.dir_archive('storage', cached=False, serialized=True)     
>>> import string
>>> import random
>>> for j in string.ascii_letters:
...   for k in range(1000):
...     data_dict.setdefault(j, []).append([int(10*random.random()) for i in range(3)])
... 
>>>

storage, , data_dict. memmap, . cached=False, , , data_dict, ... data_dict.dump() ... , , , . , (, lru lfu) , .

klepto : https://github.com/uqfoundation

joblib : https://github.com/joblib/joblib

, , , , . , .

? . klepto, , . .

+1

Mike McKerns 28 . '14 22:56

newtover · Accepted Answer · 2014-06-27T17:09:57+0000

, numpy pickle numpy.savez.
, , script .
, .

collections.defaultdict(list) plain dict setdefault.

Frequently updating stored data for a numerical experiment using Python

More articles: