How to saw python dictionary in MySQL?

I looked through several SO-Questions on how to saw a python object and store it in a database. The information I collected is:

  • import pickle or import cpickle . Import the latter if performance is a problem.
  • Suppose dict is a python dictionary (or some python object): pickled = pickle.dumps(dict) .
  • save pickled in the MySQL BLOB column using the module that has ever contacted the database.
  • Get it again. And use pickle.loads(pickled) to restore the python dictionary.

I just want to make sure I get it right. Am I missing something critical? Are there any side effects? It's really that simple?

Background information: The only thing I want to do is store Googlegeocoder-Responses, which are python nested dictionaries in my case. I use only a small part of the response object, and I do not know if I will ever need it. This is why I thought about saving the answer in order to keep the repetition of several million requests.

+7
source share
3 answers

It really is that simple ... until you need your DB to know anything about the dictionary. If you need any structured access to the contents of the dictionary, then you will need to be more involved.

Another way might be that you intend to insert into a dict. Serializing Pyrenean colors is pretty smart and can handle most cases without the need for user support. However, when this does not work, it can be very difficult to understand what went wrong. Therefore, if possible, restrict the contents of the dict to Python built-in types. If you start adding instances of custom classes, keep them in simple custom classes that don't do any fun things with attribute storage or access. And be careful when adding instances of classes or types from add-ins. In general, if you start to encounter difficult to understand problems with pickling or sawing, look at non-built-in types in a dict.

+2
source

If speed is really important, I just checked the test for loading a large python dictionary (35 MB) from pickle vs SELECTING from the MySql table with all the keys and values ​​stored in the rows:

Brine method:

 import time, pickle t1 = time.clock() f = open('story_data.pickle','rb') s = pickle.load(f) print time.clock() - t1 

MySQL Method:

 import database as db t1 = time.clock() data,msg = db.mysql(""" SELECT id,story from story_data;""") data_dict = dict([(int(x),y.split(',')) for x,y in data]) print time.clock() - t1 

Conclusion: brine method: 32.0785171704 mysql method: 3.25916336479

If a tenfold increase in speed is enough, the database structure probably doesn't matter. Note. I split all the data, separated by commas, into lists as values ​​for 36,000 keys, and it only takes 3 seconds. Therefore, I refused to use pickles for large data sets, since the rest of the 400 linear program that I used took about 3 seconds, and loading the pickle took 32 seconds.

Also note:

cPickle works just like brine, and is 50% faster.

Do not try to sort a class full of dictionaries and save in mysql: it does not recover correctly, at least this is not for me.

+1
source

If you have nested dictionaries, you must be careful. Most python objects are not sorted (and you can stuff any object as a value in a dict ). Even worse, even fewer python objects can be converted to strings and stored in SQL.

However, if you use klepto , serialization and storage in the database are pretty transparent and work for most python objects.

Let's build some typical python objects in a dict (or dicts):

 >>> class Foo(object): ... def bar(self, x): ... return self.y + x ... y = 1 ... >>> d1 = {'a': min, 'b': lambda x:x**2, 'c': [1,2,3], 'd': Foo()} >>> f = Foo(); fy = 100 >>> d2 = {'a': max, 'b': lambda x:x**3, 'c': [2,1,3], 'd': f} 

Now create a nested dict and a dump in the MYSQL archive.

 >>> import klepto >>> a = klepto.archives.sql_archive('mysql://user: pass@localhost /foo', dict={'d1':d1, 'd2':d2}) >>> a.dump() 

Now we delete our interface in the archive ... and create a new one. load loads all objects into memory.

 >>> del a >>> b = klepto.archives.sql_archive('mysql://user: pass@localhost /foo') >>> b.load() 

Now we get access to the objects in copies in memory.

 >>> b['d1'] {'a': <built-in function min>, 'c': [1, 2, 3], 'b': <function <lambda> at 0x1037ccd70>, 'd': <__main__.Foo object at 0x103938ed0>} >>> b['d1']['b'](b['d1']['d'].bar(1)) 4 >>> b['d2']['b'](b['d2']['d'].bar(1)) 1030301 >>> 

We exit python ... and then start a new session. This time we decided to use cached=False , so we will interact directly with the database.

 dude@hilbert >$ python Python 2.7.10 (default, May 25 2015, 13:16:30) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import klepto >>> b = klepto.archives.sql_archive('mysql://user: pass@localhost /foo', cached=False) >>> b['d2']['b'](b['d2']['d'].bar(1)) 1030301 >>> b['d1']['b'](b['d1']['d'].bar(1)) 4 >>> 

klepto uses sqlalchemy , so it works with several database backends ... and, in addition, provides the same dict based dict for storage on disk (in a file or directory).

0
source

All Articles